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1. INTRODUCTION 


The object of geodesy is usually defined to be "the determination of 
the shape and gravity field of the earth. " Not being happy with the idea of 
devoting ourselves to a hopeless task, we take the liberty of revising the 
definition to- read: "the estimation of the shape and gravity field of the 
earth. " Faithful to this definition, this work is about estimation, the 
process of extracting information about parameters and/or functions 
from observational data. To extract any information about the physical 
world from observations, it is first necessary to construct a simplified 
and idealized image of the real world, i.e. , a model. 

The complexity of model building is very much dependent on the 
accuracy of the observational techniques . In the presence of large obser- 
vational errors, the adoption of a simplified model is warranted since an 
elaborate model would not compensate for the loss of information associ- 
ated with observational errors . 

Indeed, simple geometric models of axiomatic validity have been of 
great use in geodetic work, although observables could not be directly 
identified with their simple counterparts in the model. This problem was 
bypassed by the appropriate a priori reduction of the observations. 

Although uncertainties were present in these reductions, their order of 
magnitude was insignificant compared to the level of observational noise. 
On the other hand, recent advances in observational techniques related to 
geodetic work (VLBI, laser ranging) make it imperative that more consider 
ation should be given to modeling problems . Uncertainties in the effect of 
atmospheric refraction, polar motion and precession-nutation parameters, 
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etc. cannot be dispensed with in the context of "centimeter level geodesy. " 
Even physical processes that have generally been previously altogether 
neglected (station motions) must now be taken into consideration. 

The problem,. in.essence, is one of modeling functions- of time or 
space, or at least their values at observation points (epochs) . Modeling 
each function value as an independent unknown parameter may be possible 
in some cases (geometric methods in satellite geodesy) , though it may 
generally result in overparameterization; and therefore the interdependence 
of function values must obviously be taken into consideration. When the 
nature of the function to be modeled is unknown, one may resort to 
representing the function in terms of a finite number of parameters using 
polynomials, trigonometric series, step functions, etc (polynomial fits in 
short arc satellite geodetic techniques, truncated spherical harmonic 
expansions of the gravity field of the earth) . The need to include a limited 
number of terms and to a priori decide upon a specific form may result in 
a representation which fails to sufficiently approximate the unknown function. 

An alternative approach of increasing application in several scientific 
disciplines nowadays is the use of stochastic models, i.e. , the modeling of 
unknown functions as stochastic processes. Although the functions under 
consideration are not known, they are, in general, not completely unknown 
either. For example, in spite of the fact that the position of the earth- 
rotation -pole at some current epoch is not exactly known, some places on 
the earth can be considered much more likely candidates than others. This 
situation of relative uncertainty suggests the use of probability as a measure 
of this uncertainty. Thus, stochastic models are means of describing the 
"likely" behavior of the unknown function. 

Relative uncertainty is hardly the usual framework for introducing 
probability concepts . On the contrary, one is usually made aware of these 
concepts through the relative frequency definition [Papoulis, 1965, p. 8]. 
According to this approach, probability theory describes the statistical or 
average behavior of random entities over an infinite ensemble of possible 
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realizations. This poses some disturbing questions about the relevance of 
probabilistic techniques to geodetic problems. Indeed, in geodesy we deal with 
unique functions rather than ensembles of likely-behaving functions whose 
average behavior would call for statistical-probabilistic descriptions. 

Such difficulties are not particular to geodesy and can be circum- 
vented with the introduction of the concept of ergodicity . A function is 
assumed to behave likely over different parts of its domain of definition so ' 
that ensemble averages can be replaced by n time" averages. 

However, the limiting frequency interpretation of probability is not 
the only one. The meaning of probability has been a matter of much con- 
troversy over the last two hundred years [Fine, 1973] . Belated arguments 
have acquired more practical importance as the use of probabilistic ideas 
has spread into so may scientific activities. We need only mention the 
distinctly different interpretation given by subjectivists who even come to 
the explicit aphorism that "probability does not exist" [Finetti, 1974, p. x]. 
According to this school of thought, the uncertainly that probability is 
meant to describe is our own "subjective" uncertainty about reality and not 
inherent to the physical phenomena themselves . This point of view is indeed 
very comforting to the geodesist eager to use probabilistic techniques; for 
even though it does not answer the question of relevance of probabilistic 
concepts to geodetic work, it nevertheless dispenses with the problem 
altogether. The classification of physical processes into random and 
deterministic ones becomes irrelevant; and since stochastic models are 
meant to describe our own indisputably limited knowledge about physical 
processes, their relevance is obvious. 

In essence, the interpretation of probability, although of relevance 
to geodetic work, is more of a problem in philosophy of science. However, 
the justification for the use of probabilistic tools in applied science relies 
on such an interpretation; and one should feel obligated to be at least aware 
of the questions, if not of giving an answer. 
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• Beyond such philosophical considerations, there are some problems 
of practical importance associated with estimation in geodesy that have to 
be addressed. 

Within a- "second -order theory" framework, existing estimation 
techniques call for the a priori specification of the first- and second-order 
statistics (means and covariances) of random parameters and functions • 
involved . 

It has been found that the gravity field of the earth cannot be modeled 
as a random field which is both ergodic and Gaussian [Lauritzen, 1973]. 

This would disqualify the use of statistical sampling techniques for obtaining 
an estimate of the relevant covariance function. 

Even in the case of ergodicity, where stochastic models might be 
appropriate, we encounter situations where the functions to be modeled 
cannot be directly observed, thus prohibiting the use of sampling techniques 
for covariance estimation. 

With respect to the first problem, the probabilistic justification of 
estimate optimality is dispensed with by exposing the deterministic aspects 
of the probabilistically derived estimation algorithm, and strictly determin- 
istic criteria for estimate optimality are established. 

In regard to the second problem, the possibility of parameterization 
of the required statistics of stochastic processes and the simultaneous 
recovery of the relevant parameters along with other unknown parameters 
is investigated. This leads to "adaptive estimation" schemes where the 
stochastic model is adapted to the observational evidence. 

In order to secure the proper use of existing estimation techniques, 
it is considered necessary to bring into light the interrelations (similarities 
and dissimilarities) of such techniques through the use of a unified approach. 
In the hope of contributing to the understanding of estimation techniques, the 
separation between the deterministic aspects of estimation algorithms as 
related to linear best approximation theory and the probabilistic justification 
of estimate optimality criteria arc especially emphasized. 
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2. FUNDAMENTAL CONCEPTS 


2.1 Introductory Remarks 

The purpose of this chapter is to present a short exposition of certain 
fundamental concepts, definitions and results which can be found in a number 
of textbooks of mathematics and probability theory and which are necessary 
for reference in the following chapters. Such an exposition will help to 
avoid a large number of references in the main text, and especially through 
the introduction of a unified notation will hopefully make the following 
chapters easier to read. 

Without trying to conceal or entertain the definite need for more 
mathematical rigor, an honest attempt has been made to bring the discussion 
to a level consistent with the usual mathematical foundation of most geodetic 
work. Whenever a more rigorous treatment is though to be necessary, 
proper references to appropriate works are given. 

2.2 Random Variables— Concept and Mathematical -Model 

The concept of a random variable is the cornerstone of this entire 
discussion. As already explained in the introduction, our concept of a (real) 
random variable is that of a real valued quantity, where uncertainty exists 
about its true value. This uncertainty reflects limitations of one’s subjective 
knowledge about the quantity in question and is completely irrelevant to the 
('•'true, nature” of the quantity. This means that an a priori classification of 
physical quantities and processes into deterministic and random ones is 
meaningless. Starting from the fundamental assumption that a "true value” 
exists, such a value is axiomatically deterministic. On the contrary, our 
image of reality, based upon our imperfect mental and observational 
capabilities, is always governed by uncertainty and is therefore random. 
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These images, or "mathematical models" to use a more familiar term, are 
always governed by uncertainty, even when this is purposely ignored and a 
deterministic mathematical model is used. 

The mere assertion of the- presence of uncertainty falls short from 
justifying the introduction of a mathematical theory of uncertainty. What is 
further needed is a measure of uncertainty, i. e. , something that helps to 
distinguish between "much" and "little" uncertainly, including the limiting 
cases of absolute ignorance (total uncertainty) and complete knowledge (zero 
uncertainty) . 

This naturally leads to the use of measure theory. Measure is a 
function m(A), which maps sets A into nonnegative reals, and furthermore 
m(0) = 0, where 0 is the empty set, i.e, , a set with no elements. 

In its nonmathematical context, measure is one of the first 
"mathematical" concepts that man conceived. It is directly related to notions 
such as long-short, large -small, i.e., to comparison of lengths, volumes, 
time intervals, etc. Unfortunately, in contemporary mathematical educa- 
tion measure theory is a rather advanced topic and therefore a somewhat 
lengthy discussion appears to be in order at this point. 

Returning to the concept of a random variable, let us think of the 
uncertain set of values that the random variable may obtain. This will be 
a (not necessarily proper) subset S of the set of reals R. The concept of 
measure can now be used to answer questions of the form: "How likely is 
it that the value of the random variable belongs to a certain subset of S?" 

This corresponds to assigning a nonnegative real number to the 
subset in question, or more generally to defining a measure on a collection 
of subsets of S. Following a universal convention, the range of this measure 
will be restricted to the finite interval [0, 1] . This leads to the concept of 
probability measure. The two critical questions about probability measure 
that have to be answered next are the following: 

(a) Should this probability measure be defined for all possible subsets of 



S (the so-called power set 7P(S) of S), or for some other collection of 
subsets ? 

(b) What are the properties that probability measure should have? 


Starting with the second question, some obviously desirable properties 

are: 

P x (0) = 0 
P X (S) = 1 

P X (AUB) = P X (A) + P X (B), for A,BCS and API B = 0 


where X denotes the random variable, and P X the corresponding probability 
measure on S . 

Another not so obvious property, but nevertheless necessary in 
connection with the concept of limit and convergence of a sequence of random 
variables, is the following: 


where 



P X (A:) 


00 

Ai C S and Ai H Aj - 0 for i ^ j . 
1=1 


We have now arrived at the concept of a countably additive measure. 
This final property makes the definition of such a measure impossible for 
$?(S) [Hoyden, 1968, p. 53]. Another collection of subsets of S must be' 
found as the domain of P x . Such an appropriate collection is a cr-algebra 
of subsets of S . 

An algebra (or Boolean algebra) of sets is a collection <?£of subsets 
of S, such that 

(i) A U B is in ^whenever A and B are, 

(ii) A° = [t: tES, t£ A 3 is in./? whenever A is. 


An algebra^ of sets is called a a -algebra or a Borel field if every union of 
a countable collection of sets in</£is again in J^fRoyden, 1968, pp. 16-17]. 
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If for simplicity S is identified with R, we are then primarily concerned with 
sets of the form {x; a^x^b}, {x; a<x<b}, {x; a^x}, {x; x < b}, etc. 
Such sets can be constructed from complements and countable intersections 
and unions, of closed- intervals-of R. The smallest o-algebra that-contains- the 
closed intervals of R is called the Borel sets #3 of R. We have now constructed 
a triplet {r, 23 , P^- }, called a (probability) measure space, and consisting of 
the set of possible values R that the random variable X may obtain, and a 
collection 73 of subsets of R, on which a probability measure P x is defined. 
However, the mathematical context of the random variable itself has not , 
been defined yet. The idea of a set of values R "obtained" by the random 
variable leads to the identification of R with the range of a function which is 
consequently identified with the random variable. But a function further needs 
a domain, and such a domain is constructed through the introduction of an 
abstract space Q of elements CO, together with a a - algebra of subsets of & 

and a measure P defined on<?f, i.e. , a probability space P}. 

Now the random variable X can be defined as a function X: R, and 

furthermore a measureable function, i.e . , a function satisfying 
X' 1 (B) * [t:X(t)eB]e^ for every BE 73 

This means that the inverse image under X of every Borel subset of R 
belongs to the cr- algebra subsets of £2 measureable under P. Up to now 
we have a mapping 

X: P} -* {r,233 

but the measure P induces a measure P x on 73 (the measure we have been 
originally interested in) through the relation: 

P X (B) = P(X’ 1 (B)) for every BG 73 

The probability measure P X is called the distribution of the random variable X . 
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2. 3 Integration, Expectation and Moments of a Random Variable 


A concept related to measure is that of integration. As a matter of 
fact, it was questions in integration theory that historically gave rise to 
the study of measure theory. An exposition of integration theory is beyond 
the scope of this work, and we shall therefore take for granted knowledge 
of the definition of integrals of the form 


j"(p(co) dP(co) , AEsfc and 
A 


/ 


<P( w) 


d P(o>) 


Cl 


where <p (to) is a measurable function <p: Ci R. 

Definitions of these integrals and related discussions on integration 
and measure can be found in several texts [Pitt, 1963; Royden, 1968; 
Kolmogorov and Fomin, 1960; Burril, 1972]. For the random variable 
X: {Cl,^i, p} [r,?3, P x 3, we have 


j x(< 0 ) dP(«) 


j x dP x (x) 


( 2 . 1 ) 


Cl R 

the latter being a Lebesgue integral. To transform this integral into a 
Riemann-Stieltjes one, we need a description of the measure P^. Such a 
description becomes possible through the introduction of the distribution 
function F x (X) of the random variable X, defined as 


F x (X) = P{co; X(to) = P x (-oo, X] 


When F X (X) is properly defined, then P^ is defined for all sets of the form 
(-oo, X], and consequently for every set BG! 23. This follows from the fact 
that B being a Borel set can be written as a union and/or intersection of as 
many as countable sets or complements of sets of die form (-oo, X], and 
from Px being countably additive. 
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Now the expectation of a random variable X is defined as 


E {X| 


‘ / 


+ 00 


X(tO) dP(CO) = 


Jx dP x (x) = J 


O 


R 


- 00 


X dF x (X) 


( 2 . 2 ) 


the latter being a Riemann-Stieltjes integral. The definition can be 
extended to any function <p(X), such that the composite mapping <p°X: fi^R 
is a measurable function 

+ oo 

E { <P( X ) } = j <P(X ( «)) dP(CO) = j <P(\)JdF x {\) 


a 


— 00 


(2-3) 


In the particular case that f^(X) ~~ F x (X) exists and is continuous, 
expectation can be defined by means of a simple Riemann integral 


+ 00 


E \<P(X)\ 


- /• 


(X) f x (X) dX 


(2.4) 


— 00 


f^(X) is called the (probability) density function of the random variable. 

Among the admissible functions <p(X), the following are of most 
interest: 

(a) The identity function <p (X) = X, giving rise to the mean of the 
random variable X 

pt x = E {x} 

(b) Functions of the form <p(X) = (X - |U x ) n , n - 1, 2, . . . , giving rise to 
the nth central moment of X 

M x (n) = E{(X -M x )“} 

For n = 1 we have trivially - 0, while for n - 2 we obtain the 

variance ff x of X: 

0 ^ = E l(X - p x ) 2 } = E {x 2 } - /i x 


10 



Not any arbitrary function F^(X), (or f^(X)), corresponds to some random 
variable X. F X (X) must have the properties 

F x (- ro ) = 0, F x (+«) = 1, F x (a)S F x (b) fora < b 

(i.e. , F X (X) is nondecreasing), and if ax < a 2 < ... < a n < . . . is a 
sequence of reals with lim an = a, then 

n->oo 


lim F v (a n ) = F v (a) 

n-^oo A A 


i. e 


• f 


F x is continuous in the left. 


Up to now only a single random variable has been considered. For a 
number of random variables X x , X 2 , . . . , x„, we can construct a vector 
X - [X x , X 2 , . . . , X n ] and the mapping X: to -* R n is called a random 
vector. 

We have taken here all the random variables Xj to he defined on the 
same underlying probability space p] . 

For a real valued function <p(X), (<p: R“ -> R), the corresponding 
composite function <p (X (to )), (<p of: to -*■ R), is a random variable (provided 
<poX is measurable), and its expectation is simply defined as: 

E|«p(X)( = y*<p(X(to)) dP(to) (2.5) 

fl 


If we define tbe distribution function 

F x ( X) = F — (Xi , X 2 , . . ,X n ) = P { <0 ; Xx (co) ^ X x , . . t X n (w) ^ X n | 

(2-6) 


and if the density function 
*X< X > = • • , X n ) 

exists, then 


^ F x ( Xx , X 2 , . . . , X n ) 
S Xx h X 2 . . . d X n 


(2.7) 


11 



E J <P(X) \ 


+00+00 ^00 

//•••/ 

— 00 — 00 — 00 


<P(X) f^(X) dX-1 dX 2 ... dX n 

A 

( 2 . 8 ) 


2.4 Stochastic Processes 

The concept of a random variable is the probabilistic counterpart of 
the deterministic concept of a real valued constant. A similar counterpart 
to the deterministic concept of a function is provided by the concept of a 
stochastic process (or random process, or random function). 

We now introduce the following notational convention: For a given 
function y(t, s) (y: T x s Y) of two variables t ET and s E S, we denote 

^ g g 

by y and y the mappings y : S-* Y and y : T“» Y resulting by fixing t or 

s respectively in y to a constant value. 

A stochastic process £ (t, to), t ET, wEflisa mapping £: T X £2“> R, 

where T is an index set and {£2 ,</f, p} a probability space. For every fixed 

tET, the corresponding mapping £*: £2 R is a random variable, while for 

fixed to the mapping £ W : T - * R is an ordinary function with domain T and range 

in R. For the various fixed to values the corresponding deterministic functions 
to 

£ (t) are called the sample functions of the stochastic process. 

If <t> denotes the set of functions <p: T R, then a stochastic process 
£ (t, to) can be alternatively viewed as a mapping £: £2 i.e. , as a 
"function valued" random variable. In our discussion we shall take T to be 
the set of reals R or some interval in R. If T is the set of positive integers 
N + , the term random (or stochastic) sequence will be used. If T E R n , then 
the corresponding mapping £(t, to) will be called a random field. 

In a description of a stochastic process, the probability measure P 
corresponding to the probability space {£2,«/£, p} is usually not given. 

Instead the joint distribution function of the random variables ^(to), £^ s (to), 

. . . , £ t# (t0) is given for any finite set of values ti, ta, ...» t n in T: 
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’£,tl,t s ,..,t 8 ( X1,XS Xn) - 

= P | «; £*<"> * X lf £ ts (<o) * X 2 £ tn <«) * X n } 

(2.9) 


Such a given family of distribution functions corresponds to some stochastic 
process, provided the two following properties are satisfied [Kolmogorov, 
1950]: 

(a) If pi, p 2 , . . . , pn is a permutation of the indices 1, 2, . . . , n, then 


^ 5 tp , Lp ? , Up 1 2 n 


1 2 


4»ti,tg,.,.j 


(b) If we let the variables X j+i, X j+2 , 
then 

ti ,t2, . . ,tj ,tj+i , . . . ,tn^ 1, ^ 3 » • 


. ( Xi,X 8 , ... ,X n ) (2.10) 

I'D 

. . . , X n approach infinity (1 £ j < n), 

X . 00 00 00 \ = 

• • ) » »•••» ; 


tl ,t2 , • • • ,t j 


( ^-1 » X2 , • ♦ • , X J ) 


(2. 11) 


The above finite dimensional distribution functions of the process determine 
a good deal of the structure of the probability measure P but by no means 
all of it. For a discussion on this interesting problem we refer to [Lauritzen, 
1973], where further references to the literature are also given. 

With the help of the distribution functions or the corresponding finite 
dimensional density function (provided it exists). 


? j tl »t2 , « • • »tn 




( ^-1 , X2 , . . . ,X n ) 

S» tl ,t2 J • ♦ . ,t n 


( 2 . 12 ) 


dXi SX 2 ... 3X, 
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we can now define a few more useful concepts . 


The mean value function p. t (t) of a stochastic process £ (t, w ) * s 

s 


defined as 

M^(t) = E{S(t, co)[ = 


/ 


R 


\ dP s>t (X) - 


h «... 


(X) dX 

(2.13) 


t 


Strictly speaking, for a fixed t value, the (constant valued) mean of the 
random variable £^(0) is defined in the usual way, and then the function 
jn^(t) is constructed by letting t vary over the whole set T. 

The (auto) covariance function C^(t, s) is a mapping C: T x.T - * R, 
defined as 

C^(t,s) = E | [ £*(«)- M^(t) ] [ £ S (co)- s) ] j. 

(2.14) 


The (auto)correlation function is a similar mapping defined as 

&£(t,s) = E | ^(CO) £ S (co) j> (2.15) 

A stochastic process with finite (auto)covariance is called a second- 
order stochastic process. Obviously, 


C^(t,s) = R^(t,s) - M^(t) M^(s) (2.16) 

Given any stochastic process £(t, to), a new stochastic' process £(t, co) = 

^ (t, co) - ju^(t) can be constructed with the properties 

/i£(t) = 0 and C^(t,s) = R-|-(t,s) 

Given two stochastic processes x(t, co) and y(t, CO), their cross- correlation 
and cross-covariance functions are simply defined as 

R (t,s) = E |x t (co) y s (co) } (2.17) 

and C xy (t,s) = . E { f ^(co) - M x <t)] [ y S (co) - *i y (s) ]( 

(2.18) 
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The autocovariance function has the following two properties: 

(a) Symmetry: C^(t, s) = C^(s, t) 

(b) The covariance function is positive in the sense that 


EL** a < 

i ) 


C ? (t lf tj) ^ 0 


for every set t 1# te, . . . , t D G T, and any set of constants ai, a 3 , . . . , 
a» GE*. 

In a more familiar form, if we consider a matrix C with elements 
Ca = C^(ti, tj ) and any constant vector aER°, then 

C T = C and a T C a ^ 0, 


i. e. , the matrix C is symmetric and nonnegative definite. 


2. 5 Stationarity and Ergodicity 

If the index set T of a stochastic process 4 (t, co), t G T is taken to be 
the real line R, and its physical interpretation that of time, then questions 
about the time invariance of the probabilistic structure of the process lead 
to the concept of stationarity. 

A stochastic process 4 (t, CO) is called strictly stationary if the finite 
dimensional distributions of the process have the property 


t f t > • • • » ^-n) 

S > ^1 j • • • 5 tn 


F 4,t 1 +T,...,t >1 +T (Xl Xn) 

(2.19) 


for every T g T = R and eveiy finite set ti, t®, . . ., t„. 


A second-order stochastic process 4(t, CO) is called weakly stationary 
(or wide sense stationary, or second-order stationary) if the mean value 
function )i^ (t) and the* (auto)correlation function of the process R^ (t, s) have 
the following properties: 

= ^( t + T ) and R^(t,s ) = R^(t + T,s+T). for every tGT. 
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This directly implies that (t) - = const., and that 

R^(t, s) = R^(t - s, 0) = R^(t-s) 

The same property follows directly for the (auto)covariance function 

3 

C^(t,s) = R(t-s) - = C^t-s) (2.20) 

Every strictly stationary stochastic process is also a weakly stationary 
process, but the converse is not generally true. A class of stochastic 
processes, such that weak stationarity implies also strict stationarity, is 
that of Gaussian stochastic processes . 

A stochastic process £ (t, to) is said to he Gaussian if for any set 
ti, tg, . . . , t n G T, the corresponding random variables 

have a multivariate Gaussian joint distribution. 

Another important concept associated with the physical interpretation 
of the index set T as time is that of ergodicity. Probabilities, according to 
the limiting frequency approach at least, are associated with infinite ensem- 
bles of events . However, we frequently have to deal with a unique process and 
not an ensemble of such processes . For stationary processes whose 
probabilistic behavior is invariant with respect to time transformations, an 
infinite ensemble may be conceptually constructed from time shifts of the 
original process. Ensemble averages can thus be replaced by averages over 
the time domain. 

A stochastic process £ (t, w ) is ergodic if its probabilistic structure 
can be determined from a single realization £ W (t). We need concern our- 
selves here only with the correlation function of the process. An estimate of 
the correlation function of a stationary process can be obtained by averaging 
a single realization over a time interval [-T, T] 


1G 



R t ( t ) 


1 


€ (t + T ) £ (t) dt 


( 2 . 21 ) 


+T 


2T 


/ 


-T 


Obviously, 

E (R t (t)} = R(T) (2.22) 

and if 

lira' » t (t) = R(T) . (2.23) 

^ CO 

the process is said to be ergodic in (auto)correlation. 

2.6 Linear Spaces — A Geometric Overview 

A great deal of mathematical entities of a diverse nature (real 
numbers, vectors, functions, random variables, etc .) have a lot of structure 
and properties which only appear to be different. Their similarities can be 
brought to light if we deprive them of their particular characteristics and 
study them in a unified way based on their common ones . To identify such 
common characteristics a great deal of abstraction is necessary. Qualify- 
ing mathematical entities may be viewed as "points" in a "space" which is 
an abstraction of our familiar three-dimensional Euclidean space. The 

abstract counterparts of common geometric notions such as distance, length, 

* 

angle, orthogonality, projection, etc. become then powerful tools of analysis. 

We shall present here only a short account of some basic "geometric" 
results from linear algebra and functional analysis, culminating with the con- 
cept of a Hilbert space. 
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A linear space X is a collection of mathematical entities which we 
shall call elements, x, y, z, . . . , together with two operations called 
addition and multiplication by a scalar, satisfying in connection with the 
field of reals R the following axioms': 

(a) x + y = y + x 

(b) x + (y + z) = (x + y) + z 

(c) there exists an element f(6X, called the null element such that 
x + 0 = x for every x in X 

(d) for every x£X, there exists a unique element (-x) such that 
x + (-x) = 0 

(e) a(bx) = (ab)x for all x G X and a,bG R 

(f) a(x + y) = ax + ay 

(g) (a + b)x = ax + bx 

(h) lx = x, 1£R 

The fundamental property of a linear space is that linear combina- 
tions of any finite number of its elements are also elements of the space 
(linearify property) . 

A set of elements xi, i - 1, 2, . . . , n of a linear space is said to be 
linearly independent if the relation 


n 



i-1 


holds only if ai = 0 for all i. 

A set of elements xi of a linear space X is said to span X (or to be a 
spanning set of X) if every element in X can be expressed as a linear combi- 
nation of the elements Xi. 

A set of linearly independent elements which also spans a space X is 
- • said to be a basis for X. 
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The number of elements in a basis of a linear space X is called the 
dimension of the space . Alternatively, the dimension of a space is the 
maximum number of linearly independent elements in the space. 

A linear space X is called a normed linear space if to each xGX 
corresponds a real number |[x j| having the following properties: 

(a) |] x || * 0 

(b) || x || = 0 if and only if x = 0 

(c) ||ax|| = | a | ||x|| for every aGR 

(d) || x+ y || 5 | x|| + |j y 1 (triangle inequality) 

|[x|| is called the norm of x. 

A metric space is a set of elements such that for every pair of 
elements a real valued function d(x, y) is defined, called the metric of the 
space, and having the following properties: 

(a) d(x, y) * 0 

(b) d(x, y) = 0 if and only if x = y 

(c) d(x, y) = d(y, x) 

(d) d(x, y) + d(y, z) ^ d(x, z) 

The metric is an abstraction of the usual concept of distance while the norm 
is an abstraction of length. Every normed linear space is also a metric 
space with the following definition of the metric 
d(x, y) = | x - y | 

A linear space X is called an inner product space if for every pair of 
its elements a function <x, y>, called the inner product of X, is defined with 
the following properties: 

(a) < x + y , z > = < x, z > + < y, z > x, y, zGX 

(b) <x, y > = < y, x> 

(c) <ax, y> = a<x, y> aGR 

(d) < x, x > ^ 0 and < x, x > = 0 if and only if x = 0 

Every inner product space is also a normed linear space with norm defined 
as || x || = < x, x > and, consequently, also a metric space. 
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It is essential to realize that the definition of a certain inner product 
(or norm, or metric) is by no means unique for a certain linear space. On 
the contrary, it may be possible to define a number (not necessarily finite) or 
inner products (or norms, or metrics) over one and the same linear space, 
thus giving rise to a number of different inner product spaces (or normed 
linear spaces, or metric spaces). 

The angle 0 between two nonzero elements x, y of an inner product 
space X is defined by means of 

cos 0 = < x, y > / 1| x || || y || 0 ^ 0 ^ n 

For 0 = 7T / 2 we have <x, y> = 0, and we say that x and y are mutually 
orthogonal (x -l y). 

A set of elements xi of an inner product space X is called an 
orthogonal set if < xi, xj> = 0 for i / j and < xi, xi> ^ 0. A similar set 
xi* is called orthonormal if it is an orthogonal set and, in addition, 

< xi*, xi*> = 1. A basis with orthogonal (orthonormal) elements is called 
an orthogonal (orthonormal) basis. 

Given a set of orthonormal elements xi* of an inner product space X 
and an arbitrary element y£X, we call the series 

J]<y,x*> x* 

i 

the Fourier series of y with respect to the set xi*. For an orthogonal set 
xi, the Fourier series of y is 

< y. x t > 

Xi 

II 

Given two elements x, y of an inner product space X, we call the element 
II x || -3 < y, x > x, the projection of y on x. The Fourier series of an 
element y with respect to an orthogonal set xi, is therefore the sum of the 
projections of y on the elements Xi. 
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A subset M of a linear space X is called a linear subspace of X, if M 
is itself a linear space. 

Given a sequence x n of elements of a metric space X, we say that the 
sequence converges to an element x of X if lim d(x n , x) = 0. This is con- 

n-»“> 

vergence in metric, and since normed linear spaces are also metric spaces, 
we can also define convergence in norm by lim || x - x n || = 0 . 

A sequence of elements xi of a metric space X is called a Cauchy 
sequence, if for every € > 0 there exists an integer N e such that 
d(x„, x n ) ^ e for all m, n & N e . 

A metric space X is called complete if every Cauchy sequence in X 
has a limit also inX, i,e. , if it contains the limits of all its Cauchy sequences . 
Since inner product and normed linear spaces are also metric spaces, we can 
define two new types of spaces: 

A complete normed linear space is called a Banach space. 

A complete inner product space is called a Hilbert space. 

Let X be a metric space and S a subspace of X. The closure of S 
denoted by S is the set of all limits of convergent sequences of S. A subset 
S of the metric space X is called closed if S = S, i.e., if it contains the 
limits of all its convergent sequences . A subset S of the metric space X is 
called dense inX if 5 = X. A metric space X is separable if there exists a 
countable dense set in it. A separable Hilbert space contains a countable 
number of elements such that the subspace they span is identical to the 
Hilbert space. 

Let H be a (separable) Hilbert space and M a closed linear subspace 
of H. An element x £ H is said to be orthogonal to -M (x J- M) , if it is 
orthogonal to every element y of M. The set M x of all elements of H 
orthogonal to M is a closed subspace of H, called the orthogonal complement ' 
of M with respect to H. One can show that for every closed M, H is the 
direct sum of the orthogonal subspaces M and M x , and we denote this by 
H = M ® M x , This means that any element x of If can be decomposed In a 
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unique way x = x + x 1 , so that xGM and x x E M 1 . x is called the projec- 
tion of x onM, and we denote this by x =P M (x). Wealsohave x x =x-x = ^\t x ( x )- 
Let Xo be a fixed element of a linear space X, and let Xi, i = 1, 2, 

, n be. a.set of linearly- independent -elements- of-X. The set of- all linear 

combinations of the form 


Xo + 



a i xi, 


ai£R (2.24) 


is called a linear variety of dimension n. A linear variety is the space of 
elements x = x Q + x where x ' belongs to M, the subspace of X spanned by 
the set xi. A linear variety V may be viewed as a translation of a linear 
subspace M by a fixed element x 0 , and we write symbolically V = x 0 + M. 

Let Xi be a set of n linearly independent elements of an inner product 
space X and ci a set of n real constants . The set of all elements y of X 
satisfying 

< y, Xi > = ci for all i (2-25) 

is called a hyperplane P of co-dimension n. 

Let y 0 be a fixed element of P. For every element y of P we have 
< Xi, y - y G > = <xi, y> - <xi, y Q > = ci - c t = 0. It follows that 
y - y 0 -L M, where M is the sub space spanned by the set xi. Every element 
y EP can be written in the form y = y e + y where y = y - y 0 EM 1 , and 
thehyperplane P can be viewed as a linear variety p = y 0 + M 1 . 

A functional is a mapping with domain a linear space X and range the 
set of reals R. A linear functional is a functional L with the property 
L(ax + by) = aL(x) + bL(y) where x, y£X and a, b ER. If X is a normed 
linear space, a linear functional L on X is said to be bounded if there exists 
a constant M ER, such that L(x) ^ M || x || for all x EX. A linear functional 
L over a normed linear space X is said to be continuous if for every sequence 
x n of X converging in norm to x EX, L(x n )-* L(x) in the usual sense. A 
linear functional over a normed linear space is bounded if and only if it is 
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continuous . Given a normed linear space X, the set of all bounded linear 
functionals over X is itself a normed linear space X*, called the normed 
conjugate or dual space of X, with norm defined as 

I L(x) I 

I L 1 = sup (2.26) 

x E X || x || 
x * 0 

For any linear bounded functional L over a Hilbert space H there exists a 

unique element x of H, called the re presenter of L, such that L(x) = 

L 

< x L , x > for every xGH. We usually denote the dual space of H by H* and 
write x*, y*, z*, ... for elements of H* with representers x, y, z in H. 

Let H be a Hilbert space with inner product <*,*>, H its dual, 
and (£2,</£, P) a probability measure space. A mapping x: H is said to 

be a Hilbert space-valued random variable if 

y*(x° > ) = <x“, y> (2-27) 

is an ordinary random variable for every y* E H* (or equivalently for every 
y EH, where y is, of course, the representer of y*). 

2. 7 Best Linear Approximation and the Normal Equations 

The problem of best linear approximation may be defined as follows: 
Given a normed linear space X and a closed linear subspace M of X, find 
the element y of M which best approximates a given element x of X, in the 
sense that 

|| x - y |l = min II x - y || (2.28) 

y EM 

To solve the problem in a Hilbert space, consider the unique decomposition 
of x into itej projections on M and M- 1 : 

x = x + x', x =P M (x) f x =P m -l(x) 

For any arbitrary element y E M, we have, taking into account that xix', 



I x - y| 


= X' 


+ x-y 


(2-29) 


The first term is fixed and the second, nonnegative; and therefore ||x — y || 2 
is minimized by setting y - y = x = ^ > M -(x). The projection of x on M is- 
the closest element to x among all elements of M in the sense of the norm 
ofX. Furthermore, the best approximation always exists and it is unique. 

If X is an inner product space and M the span of a finite set of n 
elements x t , the best approximation x to an element x of X from M can 
be written as 



1 

where the n coefficients at are to be determined. Since xGM and 
x - x EM 1 , we have x - x r xi for all i; and <x - x, xi> = 0, or 


y] a j < xi , x j > = < Xi , x > (2.30) 

i 

These are the "normal equations, " and can be written in matrix form as 
follows: 


< 

XI , Xl 

> 

< 

xi, x 2 

^ • • • • 

< Xl 

< 

xa , xi 

> 

< 

X2,X 2 

^ • • • • 

< X2 


• 



• 

• 



ft 



ft 

• 



• 



* 

« 



* 



• 

• 


< 

X n , XI 

> 

< 

X*, Xs 

• 

< x n 


or in compact notation 

Ga = u 




— 


— — 

, X B > 


ai 


< Xl , X > 

, X n > 




< X2, X > 

• 


• 



ft 


• 


ft 

* 1 
ft 


• 


ft 

ft 

, X„ > 


3- n 


< x n , X > 


(2-31) 


where G is called the Gram matrix. If M has dimension n then the elements 
xi forming a basis for M are linearly independent, and the Gram matrix G is 
nonsingular. A unique set of coefficients can be obtained by 


a = G*u 


(2-32) 
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The distance of x from its best approximation x (interpreted as the "error 
of approximation") is provided by the induced metric 

d(x, x) = || x - x || 

It can easily shown that 

|| x - x |i 3 - || x || 2 - || x |J 2 = | x || 2 - ai a J <Xl » x J > 

i J 

= II x || 2 - a T G a = | x f ~ u T G' 1 u (2.33) 

Z 

If M is of dimension less than n, then the Gram matrix is simgular; and the 
set of coefficients ai cannot be uniquely defined. However, the best approxi- 
mation exists and is unique; and the normal equations have infinite number of 
solution aj, each of them giving rise to the same best approximation x = 

S ai xi . 
i 

A related problem is that of finding the element of minimal norm 
among the elements of a linear variety or a hyperplane. This can be viewed 
as a problem of best approximating the null element 0 from a linear variety 
or hyperplane . 

Given an inner product space X and a linear variety V = Xo + M in X, 
we can decompose each element y of V as 

y = y + y'» y =2 p M (y). y s P M i(y) 

Since yJ-y', 

II y II 2 = lly+y'll 2 - |y|* + lly'll 2 (2-34) 

We can farther show that y is the same for every y £V, as follows: 

Consider two elements yi = Xo + yi, y 2 = Xo + y 2 of V, where yi, yia G M. It 
follows that yi - y 2 = yi - y 2 G M. Decomposing as usual, 

yi = yi + yl, ya = ys + ys, yi, y 2 G m, y(, y 3 G 
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we have that 


yi-ysGM, yi - y 3 G M 

and 

yi - ys - (yi - y 2 ) - (yx - y 2 ) G M 

But since y(, y 2 GM X , we also have y( - y' z 6I 1 , This is only possible if 
yi - Jz = 0, and y[ = y' z . 

To minimize 

II y II 2 = lly II 2 + lly'll 2 V-™) 

where || y'H 2 is fixed and jj y || 2 is nonnegative, we must set y - 0 . It follows 
that the element of V with minimal norm is the unique element 

y = P M i(y) for every y G V. 

In the case of a hyperplane P described by 

< y, x t > = Ci i = 1, 2, . . . , n y G P 

we can write P as a linear variety P = y 0 + M 1 , where y 0 is a fixed element of 
P and M is the span of the set Xi. The element y of P with minimal norm is 
then given by the unique projection of any elem ent y G P on (M x ) x = M 

y = ^M <y) 

To determine y , consider any element y of P. Then y = 3^^(y) = 

E ai xi can be found with the help of the normal equations 
i 

E aj < xi, xj > = < xi, y > (2-36) 

i 

Since y G P we have < Xi, y> = c u so that the normal equations become 

E aj <xi, Xj > — Cl (2-37) 

J 

or, in matrix notation, Ga = c. The elements xi are linearly independent so 
that a = G 1 c uniquely determines y - E aixi. 
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The importance of the normal equations in solving estimation problems 
can hardly be overemphasized, to the next chapter we shall apply them in 
deriving a number of apparently different algorithms . 
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3. ESTIMATION TECHNIQUES. 

3. 1 Introductory Remarks 

The objective of any analysis of geodetic data is to obtain optimal 
values for observables differing from actual observations because of 
measurement errors, for parameters functionally related to the above 
observables, and, in the most general case, functions related to observables 
and/or parameters. The essential question, of course, is what are the 
optimality criteria to be satisfied by such optimal values. 

Before establishing specific optimality criteria, we must notice that, 
in general, the optimal values sought are associated with some corresponding 
"true values. " Although such true values are difficult to define and even 
impossible to materialize in practice, we can nevertheless draw the following 
general outline for any reasonable optimality criterion: Optimal values should 
be as close as possible to true values . 

Both optimal and true values may therefore be modeled as elements in 
an abstract mathematical space (i.e. , a set of such elements), where, in 
addition, the concept of distance is defined enabling us to determine how "close" 
any two elements of the space are. But such a mathematical model is pro- 
vided by the concept of a metric space where a metric or distance is defined 
for every pair of elements. The choice of a specific metric corresponds to 
the choice of a specific optimality criterion. 

We shall mostly confine ourselves here to a specific (but nevertheless 
of wide applicability) kind of metric spaces, namely, linear complete 'inner 
product spaces (Hilbert spaces), where the metric p(x, y) follows from the 
definition of the corresponding inner product through the relation 

P(x,y) - || x-y || s j< x - y, x - y >j 8 
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la such a mathematical setup, we shall find our problem of obtaining optimal 
estimates to correspond to the problem of best approximating an element of 
the space from the elements of some linear subspace. The solution to this 
problem is given by the celebrated normal equations, and computational 
algorithms will always result from applications of this solution. The result- 
ing algorithms may well be drastically different, but our common approach 
of solution will provide us with a unified view, where similarities and dis- , 
similarities between what is usually considered to be different estimation 
techniques can be sharply identified. 

Such an analysis can, of course, take place at the algorithm level, 
where by proper algebraic manipulations one can show how different 
algorithms can be derived from a common starting point. Such approaches 
can be found, e.g., in [Liebelt, 1967, Chapter 5] and in the recent work by 
Krakiwsky [1975] . It is our belief, however, that little is to be gained from 
approaches where the connection between different algorithms is explored 
without reference to a unique underlying problem formulation and solution. 

It is hoped that the approach taken here will contribute to the understanding 
of varying estimation techniques, especially since the geometric character 
of Hilbert space techniques is more appealing intuitively, after familiariza- 
tion with these mathematical tools is attained. 

We have used the term estimation techniques repeatedly here, although 
estimation is usually connected with statistical and probability concepts com- 
pletely absent from our discussion. Indeed all the probabilistic notions one 
needs can be condensed in the definition of the inner product involved, and 
they are only involved in justifying the optimality of the corresponding metric 

as an appropriate measure of "closeness. M 

/ 

A more appropriate term might have been "adjustment techniques," 
but unfortunately this term is already connected to some specific algorithms. 

The solutions to our problems are quite independent of the specific 
inner product definition. However, the corresponding computational 
algorithms result by replacing the specific inner product into the general 
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normal equations solution, and are, therefore, related to both probabilistic 
concepts and deterministic mathematical tools from approximation theory. 

3.2 Least Squares, Adjustment 

Among various estimation techniques, least squares adjustment is the 
one most widely used in geodesy since the time of its development by Gauss . 
There is, therefore, very little to be said about algorithms associated with 
least squares . Usually these algorithms are derived from variational 
principles, as solutions to the problem of minimizing a quadratic form. Such 
an approach solves the problem but has little to offer to the understanding of its 
mathematical context and its relation to other techniques. 

Instead, we shall approach here the problem from a Hilbert space 
point of view, and we will show how particular algorithms can be directly 
derived from solutions to problems already considered in Chapter 2. 

The notation will generally follow that of Uotila [1967], with some 
minor changes of signs, so that our results can be easily compared to the 
variationally derived algorithms iii that work. 

3.2.1 Method of Observation Equations (Adjustment by Elements) 

The problem to be solved here can be described as follows: A finite 
set of unknown observables represented by a vector L* G R°, is a priori 
known to be in a linear functional relation 

L* = AX (3.1) 

with a vector X G R u representing a finite set of unknown parameters, where 
A is a known n x u matrix with n > u. 

A vector L of available (known) observations L differs from L* 
because of the presence of unknown observational errors V, according to 

L = L* + V or L = AX + V (3.2) 

This linear matrix equation (observation equations) is satisfied by an Infinite 
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number of pairs of values of the unknown vectors X, V, since to every element 
X in R u corresponds some element V = L - AX in R n . Among all such possible 
pairs of solutions X, V, we seek to find one X, V = L - AX, which minimizes 
the quadratic form 

V T PV = (L - AX) T P (L - AX) (3.3) 


where P is a given positive definite n x n matrix . 

Setting y = AX, y.- AX, we can reformulate our problem as follows: 
Let/' be the n-dimensional complete inner product space (Hilbert space) with 
elements n x 1 real vectors and inner product 

< f, g> = g T Pf f, g e Z (3.4) 

Let A t denote the i^ 1 column of the matrix A. Obviously LEZ , AiE^ 
for i = 1, 2, . . . , u and since 


y * A X = 



it follows that yEt/^, where span(Ai, Aa, . 
criterion V T P V = min can now be written as 


(3.5) 

Au) . The least squares 


V T P V ' = < V,V > = || V || 3 = || L - y II s = min 

(3.6) 


We seek to find yE & such that 

|| L - y H = min || L - y | (3.7) 

y£/ 

But this is a problem of best approximating LE/, from the elements of the 
linear subspace 

The solution is well known to be provided by the normal equations 
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< Ai , Ai > <Ai.,A 2 > < Ai , A u > 


A 

Xi 


< L, Ai > 

< A 2 , Ai > ^AgjAs^ ..... < A 2 , Au > 

• » • • 

- 

x 2 

• 

• 

' = 

< L, A 2 . > 

• 

• 

• • • * 
* . * « 
* . • * 


• 

• 

« 

Xu 


• 

9 

• 

< L, A u > 

_ 


_ . 


_ _ 


(3.8) 


To derive a computational algorithm from this solution, we make use of the 
inner product definition; and after replacing < Ai, Aj > = Al PAj = Aj PAi 
and < L, Ai > = Ai PL, we obtain 


Ai P Ax 

A 1 P A 2 . • ♦ 

A T iP Au 

— ” 
A 

Xi 


A T i P L 

A 2 P Ai 

As P Ag * * . 

* • 

As P Au 

• ! 
• 

A 

X 2 

• 

# 


a t 2 p l 

• 

# 

A t u P A x 

• « 

Au P Ag ... 

• 

Au P A u 

• 

Xu 


m 

Au P L 


(3.9) 


or 


Ai 

A2 


P [ Ai A2 



A u 1 X 


A T i 


As 


A T U 


P L 


or simply 

(A T PA)X = A t PL 


(3.10) 
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If the columns of A are linearly independed, then A T PA is a nonsingular 
matrix so that we may obtain 

± = ( A T P A ) _1 A T P L (3.11) 

y = A X = A ( A T P A)' 1 A T P L (3.12) 

V = L-y * L-A(A T PA)" 1 A T PL 

(3.13) 

a set of well-known results'. Compare, e.g., with [Uotila, 1967, Section 
3.1]. 

- /v 

The solution is illustrated in Figure 3. 1. The Xfs are the coordi- 
nates of the best approximation y to L, from the sub spaced, with respect 
to the basis [At } in this subspace. We alsobave ^ as it can easily be 
verified 

V T P A = ( V T P A 1} V T P A 3 ,..., P A u ] = 

= (L t - L t P A (A t P A)' 1 A T ) P A = 0 (3.14) 

i. e. , V T P Ai = 0 = < Aj, V > or V J. At i = 1, 2, . . . , u, and 
V 


► 4 * , ? 

3.2.2 Condition Equations 

In this problem a set of unknown observables L* £R" is a priori 
known to satisfy a linear relation 

BL* s 0 • • (3.15) 

where B is a known m X n with m < n and rank (B) = m. A corresponding 
set of known available observations L E If differs from L* because of the 
presence of unknown observational errors V E if, according to the relation 

L = L* + V (3.16) 
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Since B and L are known, the vector W = B L is also known and in view of 
BL* = 0, it satisfies the relation (condition equations) 

BV - W (3.17) 

It can be shown that since rank (B) = m< n, the matrix equation BV = W has a 
solution and, furthermore, an infinite number of solutions . Among all 
vectors V satisfying BV = W, we seek to find one V which minimizes the 
quadratic form V T P V where P is a given positive definite nx n matrix. 

To reveal the geometry of the problem in a Hilbert space context, 
we use the transformation 

~ -It 

B = 'P B T . 

(B is now n x m), so that 

W = BV = BP 1 P V = B t PV (3.18) 

If Bi denotes the it* 1 column of B, we have 

B T jPV = Wt (3.19) 

Introducing the Hilbert space ^ of nx 1 real vectors with inner product 

<f, g> = g T Pf g, tel . (3.20) 

we can write the above relation in the form 

< Bi, V > = Wi (3.21) 

Since rank (B) - m, we also have rank (B) = m and the columns Bi are 
linearly independent. It follows that V. salsifies BV = W if it belongs to 
the set 

H = {V;.<Bi, V> = Wi for i = 1, 2, . . . . , m} (3.22) 

The set H can be directly identified as a hyperplane of co-dimension m inZ . 
If N = span {Bi, Ba, ...» B„ }, then the hyperplane H can be identified with 
a linear variety, with the help of the unique fixed element Vo = (V) for 

every V G H (see Section 2.7), 
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H = {V; V = Vo+V', V'GN 1 } (3.23) 

For any V in H we have the orthogonal decomposition V = V 0 + V 7 , where 
Vo G N is fixed, v' G N x , and, therefore, Vo x V . 

Using the Pythagorean theorem, we may write 

V T P y = <V, V> = [i V f| 2 = [|Vo || 2 + IJV'I 2 (3.24) 

Since Vo is fixed, the above norm is minimized for V' “ 0, and our solution 

T A 

to V PV = min is V = V 0 . To find explicitly this value, we consider any 
element V£H, and we have 

V = Vo *P N (V) (3.25) 

Since N = span (Bi, B 2 , ...» B n ), this projection is provided by the normal 
equations 


^ O' ~ ^ ^ ^ 




< Bi.Bi > <B 1 ,B 2 > ... <Bi,B b > 


ai 


< B 2 , Bi > < B 2 , B 2 > ... < B 2 , B, > 


a2 


• • « • 


• 

= 

* » • ♦ 
* • • » 


• 

4 


Cr* O? ~ ~ ~ rv/ 




<B„,Bi> <B b ,B 2 > ... < B,,B, > 


a* 


_ __ 


— _ 



< Bi, V > 

< B a , V > 


< B., V > 
(3.26) 


and 


V = 


£ 

1=1 


a i B 


(3.27) 


Using the definition of inner product inland the fact that < Bi, V > = 
Si P V = Wt for any V G H, we obtain 


36 



b\ 


'"-'t 

B T 2 


b t b 


P [ Bi, B 2 B« ] a = W, or (B r PB) a s W 


(3 . 28 ) 


Since B = P’ X B\ and introducing M = BP’ 1 B T , we get 
\ 1 

B P 1 P P 1 B t a = B P 1 B t a = Ma ? W 

and 


(3.29) 


V = Bi ai = B a = P' 1 B T a (3.30) 


i=i 


Since the rows of B are linearly independent, M is invertible so that 

a = M’ 1 W (3.31) 


and 


V = P‘ 1 B t M’ 1 W 


(3.32) 

a well known result! Compare, e.g. , with [Uotila, 1967, Section 3.2], 
The solution is illustrated in Figures 3. 2a and 3. 2b. 


3.2.3 Generalized Model 

A more general least squares model is a combination of observation 
and condition equations of the form 

W = AX + BV • (3.33) 

where W is a known n X 1 vector, A and B are known n X u and n X m matrices, 
respectively (rank (B) - n, rank (A) = u, u < n < m), and X and V are unknown 
u x 1 and m X 1 vectors , respectively. 

Among all possible pairs X, V satisfying AX + BV = W, we want to 

A A “ A « 

find one X, V such that V minimizes the quadratic form V P V, where P is 
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a given positive definite n x n matrix. 

We shall solve this problem through the following device: For every 
fixed X we shall find the unique Vx minimizing V T PV among all V satisfying 
BV = Wx where Wx = W - AX is a fixed vector . Then by letting X vary over 

A * 

R u , we shall find the unique vector V minimizing Vx PVx among all the 
previously obtained vectors Vx . In this way V will also minimize V P V 
among all vectors V belonging to pairs X, V satisfying AX + BV “ W. The 

A A 

only objection to such an approach is that there might exist two pairs Xi, V 
and X 2 , V, both satisfying AX + BV = W and with Xi X 2 . This possibility 

A A A A 

can be ruled out by subtracting AX 2 + BV = W from AXi + BV = W to obtain 
A(Xi - X 2 ) = 0 and noting that Ay = 0 implies y = 0. Indeed, since rank (A) 

n 

= u, the columns At of A are linearly independent and 0 = Ay = E yi Ai 

1=1 

implies yi = 0 for all i, so that y = 0. 

To proceed with the solution we fix X to obtain 

BV = Wx where Wx = W - AX (3-34) 

This is simply a condition equations model, and V T P V is minimized by 

Vx = P 1 B r M 1 Wx = P^b’M'^W-AX) (3.35) 

where 

M = BP^B 1 
Introducing 

Z = P’ 1 B T M 1 W and A = P' 1 B T M' 1 A 

we obtain 

Vx - L - AX or L = AX + Vx (3.36) 

This is an observation equations model and the answer to the minimization 
of Vx PVx (and consequently of V T P V) is given by V = - “Xx where 
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( A t P A) A T P L 


A 

X = 

= ( a t m' 1 b p 1 p P' 1 B t M' 1 A.)’ 1 a t .M-' 1 b p- 1 p p' 1 b t m‘- w 

= (A t M' 1 A)' 1 A t M' 1 W , (3.37) 

and 

V = L - A X = P' 1 B T M' 1 W - P* 1 B t M" 1 A X = 

= P' 1 B T M' 1 ( W - A X ) (3.38) 

These are exactly the results we obtain through variational approaches as it 
can be seen by a comparison with equations (123), (120) and (122) in 
[Uotila, 1967, p. 57]. 

3.2.4 Probabilisitic Justification of Least Squares 

The philosophy of least squares techniques can be summarized as 
follows: A vector of n observables is a priori known to belong to a set of 
vectors satisfying a certain mathematical model. The linearity of the model 
leads to a linear subspace or hyperplane of R n as the set of vectors satisfy- 
ing the model equations. We shall call this set the "model space." When a 
vector of n observations corresponding to the vector of n observables in 
question is realized, it is in general found to be outside the model space 
because of observational errors . It is only natural then to suggest as an 
estimate of the observables an element of the model space which is as close 
as possible to the observations vector. This approach incorporates the 
anticipation of observational errors being small or at least not arbitrarily 
large. What is more arbitrary is the introduction of a certain metric , which 
depends on an arbitrary symmetric positive definite weight matrix P, through 
the relation 
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P(x, y) « II x-y II = y < x - y, x - y > 


% 

= £ (x-y ) T P (x-y) } s 

The above P-dependent inner product introduces a certain geometry and 
turns K n into a Hilbert space Lp. 

The selection of the estimate of the observables depends on the 
metric, i.e., on the matrix P, and is therefore arbitrary to the extent 
that P is arbitrary. 

The question of metric optimality, i.e. , of the selection of an 
optimal weight matrix P, finds an answer when the observational errors 
are modeled as random quantities with zero expectation and finite variance- 
covariance matrix. We shall examine each particular least squares method 
separately in this respect. 

Observation equations. The model now becomes probabilistic. The 
unknown observables L*GR" relate to the unknown parameters X G R u by 
L* = AX where A is a known n X u matrix with rank (A) = u. 

The actually -obtained (known) observations L® relate to the unknown 
observation errors V w by L® = L* + V®, where V® is considered to be an 
outcome (i.e. , a value for some fixed 60 ) of a random vector V(00) with zero 
mean E {v3 = 0, and known covariance matrix E Cw T 3 = S. Accordingly, 

L® is the outcome of a random vector L(00) = AX + V(W), with unknown 
mean E{l3=AX + e{v} = AX, but known covariance matrix 

E{(L - AX)(L - AX) t } = e{W t } = S (3.39) 

The deterministic model related to outcomes of the relevant random 
variables is 

L® = AX + V® (3.40) 

where L®, A are known, and X, V® are unknown. This is the usual obser- 
vation equation model, and minimization of (V®) 7 PV® leads to an estimate 
X P of X 

X P = (A t PA) _1 a t pl“ 
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(3.41) 



A 

where the subscript P of X P denotes the dependence of the estimate on the 
weight matrix P used. By letting to vary we obtain 

X P (( 0 ) = (A t PA) -1 A t PL(to) (3.42) 

so that X P = Xp is the outcome of the corresponding random vector X P (00). 

A 

Xp(to) has unknown mean 

E { &p{w) } = ( A T P A)' 1 A T P E { L { &> ) } = 

= (A t PA)' 1 A T P AX = X (3.43) 

but known covariance matrix 

Q P = [{ A t P A )' 1 A t pj S 

= (A t PA)' 1 A T P S PA (A T P A)* 1 (3.44) 

We say thatX^ is an unbiased estimate of X to denote that X*£ is the outcome 
of a random variable X P (to) whose expected value equals the true value of X. 

A particular choice of weight matrix P = S’ 1 leads to an estimate 
X s of X, which is the outcome (X s = X s ) of a corresponding random vector 
X s (ou ) with 

E{X S (U))} = X (3.45) 

and covariance matrix 

Qs - ( A T S' 1 A)' 1 A t S' 1 S S' 1 A ( A t S' 1 A)' 1 = ( A T S^A)' 1 (3.46) 

It can be shown that 

Qs £ Qp for any qualifying P 

in the sense that for two square matrices A and B, A * B if the ma trix A- B 
is nonnegative definite. See, for example, [Deutsch, 1965, p. 62]. 


T 

[(A t P A )’ 1 A t P j 
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If now q is a scalar parameter linearly related to X 


q 


di = d T X 

i 


(3.47) 


we have the following estimates of q and their corresponding variances: 


qp = d Xp CTp 

q 3 = d T X 3 or| 


d T Q P d 
d T Q s d 


Since Q P - Qs is nonnegative definite, we have that: 


(3.48) 


d ( Q p — Q s ) d ^ 0 

so that 

d T Q P d ^ d T Q 3 d and CTp £ ct 2 3 (3.49) 

This means that the choice of weight matrix P 1 = S = E [VV T 3 leads to a 
minimum variance estimate for any scalar linear function of the parameter 
vector X . 

Another statistical property shared by any least squares estimate of 
a linear function of X is unbiasedness 


E { Xp ( o) ) } = X 


(3.50) 


E { q P ( <o ) } = E f d T Xp ( a> ) } = d T E £ X P ( w ) } = d T X = q 

(3.51) 
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Condition equations . The deterministic model 


BV“ = w“ (3.52) 

(B, W 01 known, unknown) refers to outcomes of corresponding random 
vectors V(w), W(to). V(co) has zero mean e(v} = 0 and known covariance 
matrix E{w t 3 = S. It follows that W(W) has zero mean E{w3 = BECv} = 0 
and known covariance matrix E(WW t } = BSB T . For any choice of weight 
matrix P = D'\ the least squares estimate of v“ is 


V = V w = D B T (B D B 1 )' 1 W w 


(3.53) 


Letting to vary, we obtain the related random vector 


V(&>) = D B T (B D B 1 )' 1 W (to ) 


(3.54) 


with E { V(to) } = 0 and covariance matrix 


E [ V(co) V T ((o\ •}; f D B t ( B D B t Y 1 B S B T ( B D B T )* B D (3.55) 


The vector is related to observations L 03 through W a =BL^ and L 03 
corresponds to unknown observables L by means of L = L + V where 

A 

B L* = 0 . An estimate I* of L* can be obtained by means of 


A ^ A (t) A 

L = L = L - V 


(3.56) 


The error of the estimate L is defined to be 

e = L* - L* = ( l w - V w ) - ( L w - V w ) = - e' 

(3.57) 

The corresponding random variable e(to) = V(to) - V(to) has zero mean, and 
its covariance matrix Q« is easily found to be 

Q e = DB T (BDB T )' 1 BSB T (BDB T ) 1 BD - DB t (BDB t )' 1 BS 

- SB T ( BDB t )' X BD + S 
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Using the matrix identify from [Liebelt, 1967, p. 30, equation (1-53)], 

A C A T - BA t - AB t « (A-BC 1 ) C (A - BC' 1 ) T - B C' 1 B T 

(3.59) 

with A -> DB t (BDB t )'\ C ■* BSB T and B -* SB T , we obtain 
q 9 = [db t (BDB t )* “ SB T (BSB t )*] (BSB T )* X 

[dB T (BDB T f - SB T (BSB 7 )' 1 ] - SB T (BSB T ) 4 BS +.S 

(3.60) 

Following the reasoning of [Liebelt, 1967, Section 5-3, p. 139], Q* can be 
minimized in the sense that y T Q e y ^ y T (min Q e )y for any y £ R n , by a choice 
of P such that 

DB T (BDB t ) l - SB T (BSB T )' 1 = 0 (3.61) 

i. e. , by choosing P = S’ 1 or D = P' 1 = S. For this choice we have an estimate 
of V: 

V = V W = S B T ( B S B t )" x W W (3.62) 

and the minimum covariance matrix of the prediction error e(to) becomes 

Q e = S - S B T ( B S B T ) -1 B S (3.63) 

Again the choice P = S’ 1 related to the probabilistic structure of V justi- 
fies the least squares solution as a minimum variance of prediction error 
solution . 
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Generalized least squares model . In this particular case the 
introduction of probabilistic reasoning oversimplifies the derivation of 
the corresponding solution. The deterministic model is 

W-“ = AX + BV“" (3.64) 

with the corresponding random vector V(co) having zero mean E {v} = 0 
and known covariance matrix E {w T } = S. By setting e = B V, we obtain 
a model identical to the case of observation equations 

W® = AX + e 40 (3.65) 

where the corresponding random vector e(a) ) has zero mean E{e} - 
BE{v 3 "0 and known covariance matrix E {ee T } - BSB T = M. The ' 
minimum variance solution corresponds to the choice of weight matrix 
P = M’ 1 

X = X W = (A t M‘ 1 A)' 1 a t m’ 1 w w = 


[ a t (bsbV a f a t ( b s b t y 1 w 


CO 


(3.66) 


This is exactly the solution of the original model 

co co 

W = A X + B V 

corresponding to the weight matrix choice P = S’ 1 . 


3,2.5 Weighted Parameters and Stochastic Processes 

One particular case of least squares is when some or all of the 
parameters are considered to be random quantities with known expectation 
and covariance matrix. In the case of the generalized model we have 


W = A X + B V 


[AG] 



+ B V 


AX + Gs+B V 
(3.67) 
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with E { V 3 = 0, E { V V T } = Cvv 


E{s } “ S, E{(S-S)(S“S) T } = Cbb 

setting s = s + s (E{s) = 0), we obtain 

W - G s = A X + [ G B ] 


s 

V 


(3.68)- 


This can be written after some obvious changes in notation as 

W « A X + B V 


with 


E { V 3 = 


E { s 3 
E { V 3 


** 0 , 


(3.69) 


" E { s s T 3 

E { s V T 3 " 


On 

C*v 

E { V s T 3 

E {VV T 3 


Cv « 

Cvv 


E t V v T 3 = 


This is exactly a simple generalized least squares model with minimum 
variance solution 


= S 


X = ( A T M* 1 A y 1 A t M' 1 W , 


where M = B S B 1 

(3.70) 


and corresponding covariance matrix 


c xx = ( A_ T M^A) 1 (3.71) 

Usually the additional assumption C s v = 0 holds, so that 
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M 


[ G B ] 


and 


’c B « 0 “ 


1 

1- 

O 

I 




0 Cvv 


b t 


G C«, G T + B Cvv B t 
(3.72) 


“1 

X = £ A T ( G C, a G T + B Cvv B t y 1 A j A T ( G C 8 , G T + B Cvv B T )* X W 

(3.73) 

In the particular case that B = G = I, and with some changes of notation 
s -* s', W"> x, V n, C*s C B ' 8 ' = C, Cvv C nn = D, C = C + D, we obtain 

X = (A t C*A) 1 A t C' 1 x (3.74) 

one of the results of Moritz [1972, p. 15, equation (2-35)] , under an approach 
which he calls "least squares collocation, " in connection with the model 

x — A X + s’ + n (3.75) 


Equation (2-36) in Moritz [1972] can be similarly derived applying results 
from classical least squares (minimum variance) methods . As to the third 
of the results (his equation (2-38)), we will have more to say in Section 
3.3. 

The case where all parameters are random reduces to the method 
of condition equations 

W=As' + BV, E (s 3 = s , E tv} = 0, s = s - s , 


E { s 3 = 0-, E{ ss T } = C„ , E{ V V T 3 = Cw 

(3.76) 


We rewrite the model as 

W-As=As+BV = [AB] 
or after some obvious change of notation 


s 

V 


(3.77) 
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B V 


W, 


with 


e{ y 3 - 


eCs 3 
e{ v 3 




The solution is 


S B t (BSB 1 )' 1 W (3.79) 


In the particular case that C B v = 0, we obtain 


s — Cji 

a t 

( A C BB A t + 

rv 

B 

Cvv 

B t f 

W 

(3.80) 

V = Cvv 

b t 

( A C BB A t + 

/Si 

B 

Cvv 

B T )' 1 

W 

(3.81) 


Of particular interest is the simple case when A = B = I (observing a set of 
random quantities s under additive white noise n (V n)), 

W = S + n 

with solution - 

s = C BB (C BB + Cnn)" 1 W (3.82) 

We shall return to this equation in Section 3.3, in a somewhat different 
context. 

Up to now we have discussed random parameters without reference 
to how they might arise in an actual real life situation. As we have seen, 

/-N-» 

unknown random parameters s with known expectation s' and covariance 
matrix can be reduced to random parameters s with zero expectation and 
known covariance matrix. 
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In a least squares context (with minimum variance justification of 
the metric used), the distinction between zero mean random parameters s 
and zero mean random observational errors V is strictly verbal 

The solution algorithms we derived are exactly based in this lack 
of differentiation between the two. However, the distinction is very real 
in the process of modeling a real life situation, i.e. , in the process of 
arriving through scientific reasoning to a least squares probabilistic model. 
The reasoning behind modeling observational errors as random variables 
with zero mean is too familiar to be repeated here, but the case of random 
parameters needs some discussion. A linear least squares model arises 
when a set of observations corresponds to a set of observables known 
(modeled) to be in a linear (or properly linearized) functional relation to a 
set of unknown parameters. Some (or even all) of these parameters may be 
connected to some unknown underlying function (or a number of such functions) 
with a certain domain T . 

If the unknown function is modeled as a second-order stochastic 
process (usually called signal) with known mean value function and (auto)- 
eovarianee function, we may reason as follows: The underlying stochastic 
process is viewed as a mapping £: £2 4>, where {£}, p} is a probability 

space and <£> is the space of the sample functions of the process £ (t, 00). The 
connection of an unknown parameter si to the underlying function is mathe- 
matically provided by the concept of a functional with domain <f>, i.e., by a 
mapping if: <l> -* R. If the composite mapping if ° £ : Q, -* R is measurable 
with respect to the probability space p}, then s' i = if (£ (t, oo)) is a 

random variable. The whole set of such random variables forms a vector 
s’; and if the mean and covariance matrix of s’ can be induced from the known 
mean value function and covariance function of the stochastic process 
£ (t, oo), we have arrived at the models we considered at the beginning of 
the section. 

Of particular interest is the case when the space of sample functions 
<J>is a Hilbert space and the functionals if belong to tho dual space 4>*, i.e., 



the set of linear bounded (continuous) functional on We will have a chance 
to return to this case later on for a more rigorous treatment of the problem. 

3.3 Linear Least Squares Prediction 

In linear least squares adjustment (observation equations), the 
original estimation problem is mathematically modeled as linear best ’ 
approximation problem. A known element of a space is best approximated 
from the elements of a known subspace. However, the known element to be 
approximated does not appear explicitly in the solution (normal equations). 
Instead, only its inner product with the elements spanning the subspace of 
approximation need to be known . 

We have already exploited this fact in the case of condition equations' 
where we have obtained the solution by best approximating an unknown 
element V from the elements of a subspace N. The additional condition 
that V should belong to a certain hyperplane H provided us with the knowledge 
of the values of the inner product between V and any of the spanning elements 
inN. . 

Based upon this key observation, we shall now examine an estimation 
technique where an unknown element is best approximated from the elements 
of a known subspace, and the necessary information for the solution is con- 
tained in the a priori knowledge of the inner product values between the 
unknown element and the spanning elements of the subspace. 

In the case of least squares adjustment techniques we started with a 
deterministic solution based upon an arbitrary inner product (arbitrary 
weight matrix), and finally the particular choice of a statistically meaningful 
metric (choice of P" 1 = E {W T }) led to statistically meaningful results 
(minimum variance solution). Here we shall follow an exactly reversed 
path. We shall start with a statistically meaningful metric (inner product) 
and a minimum variance solution to finally obtain a deterministic solution 
with a corresponding weight matrix. 
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Because of this duality (deterministic-probabilistic prediction) , we 
have chosen the title- "linear least squares prediction," instead of the usual 
"minimum variance or minimum mean square error prediction, " also follow- 
ing the terminology in [Doob, 1953, Chapter 12] and in [Cramer and Lead- 
better, 1967, Section 5.7]. 

3.3.1 Probabilistic Approach (Minimum Variance or Minimum 
Mean Square Error Prediction ) 

We shall first present the method in a somewhat abstract context, 
and then proceed to show its connection to real life estimation problems. 

The proper environment for our method is a space with elements 
square integrable real valued random variables (i.e. , random variables 
£ (co) with e{|£(co)| 2 }<®). If the random variables are defined on an 
abstract probability space {to, A , P}, we shall denote this space of square 
integrable random variables by Z? (£2 , A, P) . This space can be turned into 
a Hilbert space by introducing the inner product 

<x(CO), y(CO)> = E {x (CO) y(C0) } (3.83) 

(to. A, P) can be shown to be a linear space and the correlation of two 
elements can be easily shown to satisfy the defining properties of inner 
product. It can also be shown that Z 2 (£2 , A, P) is a complete space (see 
[McGarty, 1974, p. 136] ) and consequently a Hilbert space. 

Given a set of random variables Xi(co)£Z 2 (to, A, P), i = 1, 2, . . . , 
n, we denote their. span by MC Z 2 (to, A, P). For any other random variable 
y(co ) £ C (&, A, P), we want to find the best approximation y(co) to y (CO) 
from M. The answer is well known to be 

y(W) = P M ( y(cc)) 

Since y (CO) £ M, and if we further assume that the random variables xi (co) 
are linearly independent, there exists a unique set of constants a*, such that 
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y((0) a ^ a i xi(C0) 
i=i 

The constants a t are determined from the normal equations 


— — 


_ — 

< Xx , Xl > < Xi , X 3 > ... <Xi, x« > 


ai 

< X2J Xl > < Xg , X;} > ... <X 2 , X, > 



• • • « 


• 

« • • * 

• • • • . 


• 

• 

< X M Xl > < x a , x 2 > ... <x n ,x a > 


a* 

— — 


— — 


In matrix notation 

Cx* a = C xy (C xx is n*n, C xy is nx l 


where 


E { x t (co) x 5 (<0) } 

and 

O 

X 

II 

E { Xi(to) y (co) } 

Introducing the matrix notation 

* 

we have 

x = [ Xl, x 2 

> • • • > x n ] T 


C xx = E { x x T 3, 

C xy = E { x y 3, 


y ■ E a - x * 

= a T x 

and 

y = C xy C xx 

X 


( 3 . 84 ) 

<x t , y > 

< x 2 , y > 

« 

< x ft , y > 

( 3 . 85 ) 

) ( 3 . 86 ) 


& xy 

( 3 . 87 ) 

( 3 . 88 ) 


53 



Let us now examine how such a problem may arise in practice. 
Suppose we observe the values of a function s(t) at certain points ti, 1 = 1, 2, 
...» n, and we want to predict its value at some other point t P . The observa- 
tions -and- the value to -be -predicted correspond to- functionals on some function 
space containing s(t) called evaluation functionals and of the form 

if (s) = s(ti) i = 1, 2, . . n 

and (3.89) 

if (s) = s(t P ) 


If the unknown function s(t) is modeled as a second-order stochastic process 
s(t, to) (signal), then the evaluation functionals on the sample function space 
define random variables 


, * . Cfl . ti 

li (s ) = s (to) 


(3.90) 


The observations are outcomes of the random variables s ‘(to). The 

tp 

outcome corresponding to s (to ) remains unknown and is approximated 
by the outcome corresponding to the projection of s tp (to ) on M = span 

IsV)} 


s P = 



a i x t 


= a T 


x 


(3.91) 


-i 

where a = C e8 C lP and s P = 

The elements of the matrices involved are 



- E { 

s * 1 ( Ui ) 

S tj ( to ) } 

[ C.p ], 

= E{ 

sN w ) 

s tp ( CO ) } 


Cjp cA x 


« r ( tj, tj ) 

(3.92) 

= r ( t t , t P ) 

(3.93) 


r>-i 



where r(t, s) is the correlation, function of the stochastic process s (t, co ) . If 
s(t, co) has zero mean, then r(t, s) is its covariance function. From a given 
process with known mean value function, we can always construct one with 
zero mean (see Section 2.4), and we will, therefore, assume that the : 

stochastic processes involved are zero mean, without loss of generality. 

I 

In the more general case, our model might involve not one but a 
number of second-order stochastic processes of known autocovariances and 
cross -covariances (when correlated). The observations and predictions) 
need not necessarily correspond to values of these processes at certain 
points tj, t p . The only conditions on their nature is that the corresponding 
functionals 1 on the sample function space $ of the process induce mappings 
1 o <p : ft ^ R (<p: ft-» ®), which are measurable with respect to the probability 
space P} (i.e. , random variables) and also belong to;£ 2 (ft, P) 

(i.e. , they have finite variances). 

The Hilbert space where the approximation of s* p (CO ) from the sub- 
space M takes place need not necessarily be X 2 (ft ,</€, P) . It can be replaced 
by a Hilbert space (s (t, CO), tGT) with the same inner product and 
defined as the completion of the space s (t, CO), t G T) of all random 
variables u(to ) which may -be written in the form 

k 

u(co) = ^ C[ s tj ( CO ) , c t G R (3.74) 

1=1 

(s (t, CO), tGT) contains all elements in ^(s (t, CO), tGT) and the limits 
of sequences of such elements. For more details, see [Parzen, 1959, p. 

259]. 

Of particular interest is the case where one of the processes involved 
x(t, co ) has sample functions in some Hilbert space H, and any other process 
y (t, co) has sample functions also in H, related to those of x(t, CO) through 
a bounded linear operator L:H H 

y“=L{x w } . (3.95) 
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(3.96) 


Any bounded linear functional f on H has a representer f EH 

fV) = <A f> 

CO 

If g is afunctional acting on y (t), with representer g, then 

g*(y > ) = <A g> = <Lx“, g> (3.97) 

If we introduce an operator L* called the adjoint of L (see [Taylor, 1958, 
Section 4.9, p. 249]) and satisfying 


< Lx, y > = < x, L y > 


(3.98) 


we obtain 


g*(y) = <Lx, g> = <x, L^g > = <x, h > = h*(x) (3.99) 


where 


h = L* [g 3 


We need, therefore, deal only with the process x(t, co) and linear bounded 

- ^ 

functionals f E H , the dual space of H. 


3.3.2 Deterministic Approach (Collocation) 

In this section we shall examine a deterministic prediction technique 
introduced by Krarup [1969], in relation to the prediction of quantities 
related to the gravity field of the earth, and usually referred to as (exact) 
collocation. We shall follow Tscheming [1973] with a little more emphasis 
on "geometry. " 

Strictly speaking, collocation is a technique for finding a solution to 
a differential equation with insufficient boundary data. The differential 
equation admits an infinity of solutions and sufficient boundary data determine 
a unique solution among all possible ones. Insufficient boundary data restrict 
the candidate solutions to those satisfying the boundary conditions. Colloca- 
tion is then a technique for determining a solution which is the smoothest, 
in some certain sense, among all solutions satisfying the boundary conditions. 
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Here we shall approach the problem without reference to any 
differential equations. We shall simply assume that we observe the values 
of bounded linear functionals on some unknown function f belonging to a 
Hilbert space H, and we want to find an estimate f of f , such that 

II f I = rajn ||f|| (3.100) 

where P is the set of functions in H satisfying the conditions imposed by 
the observations . 

Let the observations di, i = 1, 2, . . . , n correspond to bounded 
linear functionals if on H (if E H*) with representers lj £ H 

d t = if (f) = <f, lt> i = 1, 2, . . . , n (3.101) 

This set of equation's. can be directly identified as a restriction that f should 
belong to a hyperplane P in H, which can alternatively be described as a 
linear variety, with the help of the unique element go = ^“^(f), for every 
f E P (see Section 2.7, p. 25): 

P = Cg; g = go + g', g 7 E M x } (3.102) 

where 

M = span £ It, i = 1, 2, . . . , n } 

The situation is now similar to the case of condition equations in linear least 
squares adjustment, and the element E P with minimum norm is provided 
by 

. fo =P M (S) . (3.103) 

where g is any element in P (see Figure 3. 3). Since fo£M, we have 

n 

f 0 = a, 1, (3.104) 

1=1 

and the coefficients aj are provided by the normal equations: 
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Figure 3. 3 The Geometry of Collocation 
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< ll, ln> 

^ ^2 > ll > ^ ^ 2 > 1 2 ^ • • • < 1 2 * 1 n > 

• • • » 

• • • • 

• « » » 

<l».ll> < 'la»l2 > ••• < -ln>ln- > 

(3.105) 

or, in matrix notation, 

C a = d , with Cjj = < 1 1 , 1 j > (3. 106) 

If the representers lj are linearly independent, then C is nonsingular 
and 

a = C 1 d (3.107) 

We have, therefore, estimated the unknown function f by its projection on 
the "data space, " i.e. , the space generated by the representers of the 
functionals corresponding to the available data . 

We are next interested in estimating the value of some other 

3je 

possibly different bounded linear functional 1 P on f. Its true value is 

d P = 1* (f) = <f, 1 P > (3.108) 

We can obtain an estimate using the estimate ip of f 

dp = 1* (fo) = <fo, 1 P > (3.109) 

The error of prediction is 

d P - d P = < f, 1 P > - < fo, 1 P > = < f - fa, 1 P > (3 . 110) 

Since f E P and fo E P, we have that f - fo E M 1 . If we decompose 
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where 


lp _ Ip M + lp M 


(3.111) 


l m = ^ M < lp) “d W£M A 

we obtain, taking into account that also f 0 £ M, 

dp -d P - <f-f 0 ,lpM 4 'lpM-*- > “ 

= <f,lp M J- > = lp M o.(f) 

i. e, , only the part of 1 P in M x contributes to the error* If l PM a. = 0, i. e. , 

A 

Ip G M, then d P - d P = 0, Especially when 1 P = lj, then obviously lj G M, 
and 

d i - d t = < f-fo , 1 4 > « 0 (3.113) 

We have, therefore, recovered the original observations, i.e., the approxi- 
mation fo obtains the values observed. This justifies the name, collocation 
as in the case of a differential equation solution. 

An alternative way to look upon the estimate is the following 

p = <fo,lp > = < fo»lpM + Iph 4 - > ~ < fo»lpM > + < 1 o*^ph* > = 

= < 'fo>lpM' >=: < fo - f + f » l.PH > = 

= -< *f — fo»lpM' > < 'f>lpH' > = < f , 1 P M > 


Hence 


d P - f » 1 p m ^ ~ l P (ji(f) 


(3.114) 
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This means that the prediction corresponds to the true value of a functional 
l PM with representer (see Figure 3.4) 

lpM= ^M (lp) (3 - 115) 

Since ib = £ at li, we have for. the prediction 

i=i 


a 

dp — <f 0 , 1 P > = 53 ^ i 1 1 > Ip- 5 ’ = \ a * < 


1 i » 1 p > ~ 


[ 3-i, a .2 • • • a n ] 


< li , 1 p > 


<1 Q ,1 P > 


= a T C P = 


Cl a = C P T C' 1 d 


(3.116) 


where 


[ C P h = < 1 1 » 1 p > 


To apply the method one must obviously know how to compute inner 
products between elements of the Hilbert space. Furthermore, the function 
f to be approximated is usually known to belong to a linear space which 
becomes a Hilbert space only after the introduction of an inner product. 

The choice of inner product is generally not unique and the same original 
linear space may give rise to different Hilbert space, different geometries, 
and, consequently, different choices of approximations Jb to the original 
function f. 

The problem of inner product choice remains open in a s imilar way 
as with the problem of choosing a weight matrix in the case of deterministic 
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linear least squares adjustment. Again the answer will be found by intro- 
ducing probabilistic reasoning to justify metric (inner product) optimality. 

We shall consider this problem in the next section where a link between the 
probabilistic approach (minimum variance prediction) and the deterministic 
approach (collocation) will be established. Before we do this we need to 
introduce the concept of a reproducing kernel in a Hilbert space . 

Let H be a Hilbert space of functions f: T R. A function 

k: T X T~* R is said to be a reproducing kernel k(t, s) for H if 

t t * t 

(a) k (s) 6 H for every t E T (k (s) = k(t, s) denotes the mapping k : T~* R 

for fixed t) 

g 

(b) f(s) = < f , k > for every s£T and every f EH (reproducing property). 

It can easily be shown that k(t, s) is symmetric, i.e., k(t, s) = k(s, t). 

For a Hilbert space H to have a reproducing kernel, it is necessary and 
sufficient that the linear functionals (evaluation functionals) 

1* (f) = f(t) for every f E H and t £ T 

are bounded, i.e., l t G H . (See [Aronszajn, 1950] and also [Lauritzen, 
1973, Chapter 4]). 

If e n (t), n = 1, 2, ... is an orthonormal basis for H, then 

00 

k(t,s) = ^ e„ (t ) e n (s) (3.117) 

n=l 

We shall next apply the concept of a reproducing kernel to two particular 
cases of collocation. 

Case A: All functionals if, if involved are evaluation functionals 

l?(f) - f(t t ), l*(f) = f < t P ) (3.118) 

The representer l(s) of any functional 1*£ H is related to the reproducing 
kernel by means of 
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(3.119) 


* <S 

Ms) = 1 (k s ) 

For a proof, see [Shapiro, 1971, theorem 6.2.4. 1, p. 85]. In particular, 
for evaluation functionals we have 

li(s) = 1 1 ( k S ) = k S ( ti ) = k tl (s) (3.120) 

It follows that, in view of the reproducing property and the symmetry 
of k(t, s) 

<li,lj> = <k‘,k , > = k tl ( tj ) » k(t 1 ,t J ) (3.121) 


and, similarly, 

<1, , 1 P > = k(t,,t P ) (3.122) 

i. e. , the elements of the matrices C, Cp are obtained by evaluating the 
reproducing kernel at the corresponding points 

C„ = k(tj, tj ), [C P ], = k(t, , t p ) (3.123) 


Case B: The functionals 1*, 1* involved are of the form 

Mf> - (Lf, (ti) . 4(f) . (Lf, (tp) 

for some bounded linear operator L: H ** H. 

I I I -r 

■ Let 1, , lp denote the evaluation functionals of the previous case. 
Then 
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< L f , 1 , > 


(3.124) 


li(f) = 


= < f , L* 1 1 > 


where we have made use of the definition of the adjoint operator L of L. 
It follows that if has representer 


l t = L 1 i = L 1 1 ( k ) = L k 1 (3.125) 

MaMng use of the definition of the adjoint and the reproducing property, 
we obtain * 

<l,,lj> = < L k* 1 , L k tj > = < L L k* 1 , k tj > = 

■ (LL * kt, >(t,) ■ (U ’ k| it 1 ,.,) 

(3.126) 


In a similar way , 

< 1 1» Ip > = (LL*k) p) (3.127) 

(LL k is a shorthand notation for L t [L e k(t, s)], where subscripts of 
operators denote the variable on which the operators act.) 

The elements of the matrices C and Cp are obtained by applying the 
operator 'on the reproducing kernel, according to a "law of propagation” of 
the reproducing kernel . The similarity to the law of propagation of 
covariance functions of stochastic processes under linear transformations 
provides a ,f hint” to a stronger connection to be revealed in the next section. 

• t * 

3.3.3 Relation Between Probabilistic Approach and Collocation 

In the two previous sections, we examined two ways of modeling and 
solving the problem of prediction of quantities related to an unknown function 
from observations related to the same function. At a first glance the two 
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approaches seem radically different. In the probabilistic approach, the 
unknown function is modeled as a second-order stochastic process s(t, to ) 
and the prediction problem becomes an approximation problem in the Hilbert 
space it (s (t, co-), t 6 T) C<£ 2 (&,<*£, P). In the deterministic approach, 
the unknown function is modeled as an unknown element f(t) of a Hilbert 
space H of functions x: R, and the prediction is dominated by the metric 

induced by the inner product in H. Knowledge of the structure of the 
unknown function (e.g. , continuity, differentiability) usually gives rise to a 
number of different inner product choices and, consequently, different 
Hilbert spaces H where the prediction takes place . 

The problem of inner product choice is equivalent to the problem of 
covariance function choice in the probabilistic approach. It will be shown 
here how a connection between a certain covariance function and a corres- 
ponding inner product may reveal the equivalence of the two approaches. 

Our main tools for this purpose are going to be the Moore-Aronszajn-Loeve 
theorem and the Karhunen-Loeve expansion of stochastic processes. 

The Karhunen-Loeve expansion is usually given for intervals of the 
real line, i.e. , for stochastic processes 4 (t, co), t E T with T = [a, b] C R. 

A formal exposition can be found in [Papoulis, 1965, p. 457], and a more 
rigorous treatment with an exposition of the conditions 4(t» to) must satisfy 
for the expansion to hold can be found in [Gikhman and Skorokhod, 1969, 
p. 188]. We shall give here a more general exposition of the Karhunen-Loeve 
expansion, without restrictions on the specific nature of T, for the case 
where 4 (t, co) has sample functions in a Hilbert space and can be viewed as 
a Hilbert-valued random variable. We shall follow [Lauritzen, 1973, 

Chapter 6] and [Rozanov, 1968, Chapter I, Section 3] with a minor shift of 
emphasis from the dual space H* to the Hilbert space H itself, and without 
introducing the restruction of 4 (t, co) being Gaussian. 

A second-order stochastic process 4(t, co) (4: T x R) with 
sample functions 4^ (t) in a Hilbert space TI of functions x: T -» R can be 

alternatively viewed as a Hilbert space-valued random variable 4: IF. 

GO 



Assuming without loss of generality that E f £ (t, to) } = 0, the covariance 
function of £ (t, to) is a mapping r: TXT^R, defined as r(t, s) = E {4* £* }. 
Now, if we view 4 (t, to) as a Hilbert-valued random variable, we have 

r ( t,s ) = E{lt(C i °) M4 W > } (3.128) 

where if, 1? G H* are evaluation functionals 

1* (x) = x(t) for every xGH 

Let H t CH denote the set of all evaluation functionals in H and define a 
mapping R*: H* x Hf R as 

jjt jf; 

R ( l t , 1. ) = r(t,s ) (3:129) 


This mapping can be extended to the whole H* (R*: H*XH^ R) by 
setting 


* * * 
R Hi, ij 


E { 1?U W > lt(4 W ) 3 for every 1*,1* G H* 

(3.130) 


Since each 1* E H* has a unique representer l£H, we can define a mapping 
%: H X H -» R as 


R ( li , lj) 


j|c 

R ( 1 1 , 1 s ) for every li, lj E H 

(3.131) 


ft can be represented by an operator H -* H by means of 


R(l, , 1) ) ■ <ai,,l ) > (3.132) 

H 

Such a bounded linear operator has the following properties: 

(a) is symmetric (self adjoint): 

<ftx,y> = E{x*(4 W ) y*(4 w ) 3 = E{y*(4 t0 ) x*(£ w )}.= 

H 


— <71 y, x > = < x, %y > (3,133) 

H H 

(See [Taylor, 1958, p. 324].) 
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(b) is positive (or positive semidefinite, or nonnegative), since it is 
symmetric and 


< m x, x > 


E 


[■*«•>]' 


for every x G H. 
(3.134) 


(See [Dunford and Schwartz, 1963, p: 906].) 

(c) $is a Hilbert-Schmidt operator. For a definition of the term, see 
[Dunford and Schwartz, 1963,. p. 1010] . A proof that $ is Hilbert- 
Schmidt is given in [Grenander, 1963, p. 129], based upon corollary 3, 
p. 1011 of [Dunford and Schwartz, 1963], 

(d) 1$L is- compact (completely continuous) . For a definition of the term, 

see [Taylor, 1958, p. 274] or [Kolmogorov and Fomin, 1970, p. 239]. 
This property follows directly from the fact that $ is Hilbert-Schmidt 
(see [Dunford and Schwartz, 1963, p. 1012, theorem 6]). 

(e) ^ has finite trace, i. e. , if {e n } is a complete orthonormal sequence in 

H [Grenander, 1963, p. 129]: 


Y <&e n ,e n > = Yj E U < e n , 4 W > l 8 } = E { ||$“ ||J } < 
» H H (3.135) 

where £ is restricted to the class of second-order Hilbert space-valued 
random variables satisfying the last inequality. 

Such linear bounded operators which are symmetric, positive, and 
have finite trace are called S-operators in the relevant literature [Prokhorov, 
1956, p. 172; Bharucha-Reid, 1972, p. 48;' Grenander, 1963, p. 129; 
Parthasarathy, 1967, p. 154], 

Since $ is symmetric and compact, we have the following theorem 
based on corollary on p. 251 of [Kolmogorov and Fomin, 1970] (see also 
[Taylor, 1958, p. 336, theorem 6.4-B; and Bharucha-Reid, 1972, p. 48]): 

Theorem : ^£has a sequence of eigenfunctions { e n ) with corresponding eigen- 
values {Xj, such that { e n 3 is a complete orthonormal sequence 
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in H, and 


Vtx = E X n < x , e n > e n for every x E H 
» (3.136) 

It follows from property (e) above that 

^<$e n ,e n > = X n e a , e n > 

n n 

and 

X n ^ 0 

Since ^ is positive, we also have 

<??e n ,e n > = <X n e n ,e n > = X n ss 0 (3. 138) 

H H 

We shall refer to $ from now on as the "covariance operator" of £. 
Strictly speaking, $ as defined here is "correlation operator, " and becomes a 
covariance operator under the assumption that m = E { £ 3 = 0 , where the mean 
element m E H of £ is defined by means of 

x(m) = E{x(^ a> )} for every x* E H* (3 . 139) 

If m / 0, we can introduce a covariance operator ^ by means of 

< 6 x, y > = E{x(£ W -m) y ( - m) }, x,y E H 

(3.140) 

The representer l t of the evaluation functional 

* 

It (x ) = x(t ) , x E H , 

has a Fourier expansion in H 


* • X) X “ (3.137) 

n 
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(3.141) 


It 


V e n (t) e„ 


since for every xG.H w.e hav.e 


* 

l t (x) = <x,l t > = <x, Ve n (t) e n > 

H H 


= S < x ’ e « > e » (t > 

H 

n 


x(t) 


(3.142) 


Applying the above theorem (equation (3-136)), we obtain 

1 1 = < lt> e n- : ^ e n * 2 ^ ^-n e n (^ ) e n 

n " (3.143) 


It follows that 

r(t,S) = <??l t ,l s > = < X^^n e n( t ) e n»y^ e n(S)ei,> 

« - H 


= Z)L x n e n(t) e a (s) <e n ,e. > 

- H 


^-n ®n(^) ®»(®) ^n» “ A n 6 n (t) e n (s) 

” “ n (3.144) 


Since $ is positive we have A„ ^ o, and setting A „ = a„ we obtain 


r(t,s) = 2^ CT " °»< t ) ® #" ( ® ) (3.145) 

n 
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For any fixed co, ^ G H and has a Fourier expansion 


.to ^ .to 

£ / V < £ 9 ® n ^ 

i r 


X £n< W ) e n 


The functional d* (^) = £ n (00 ) has representer d n = e n since 


< 4, e n > - Sn = d n (£ ) 


and, therefore. 


E { £ n (co) S»(t0) 3 = < He n , e* > 

H 

Since e n is an eigenfunction of $, we have 
^Pen ~ X n e n 

and consequently 

E{Ci4j 3 = < Xi e i , e j > = Xt ^ © i » 6.j > = 


a 
O’ i 


The expansion 


4(t„ W) = 2 ® n ( ^ ) 

n 

with mutually orthogonal coefficients 

E £ (• t } = 0 for i/j 

and 

EU 2 „ 3 - X n = a 2 n 

is called the Karhunen-Loeve expansion of £ (t, to). 


( 3 . 146 ) 


( 3 . 147 ) 


( 3 . 148 ) 


( 3 . 149 ) 


( 3 . 150 ) 


( 3 . 151 ) 


( 3 . 152 ) 


( 3 . 153 ) 
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We can now restrict ourselves to the study of positive definite 
covariance operators if, such that 

< ^x, x >h > 0 for every x£H and x ^ 0 

Indeed, if $ is simply nonnegative, the Hilbert space-valued random 
variable £ can be written in the form 

(3.154) 

£ n e n (3.155) 

£„ e n (3.156) 


E{ x*U?) /<*?> 3 (3-157) 

This means that £ and £* are indistinguishable with respect to their second 
moments, relevant to prediction problems and that £ can be replaced by £* . 
We can also replace the original Hilbert space H with a new Hilbert space 
H / , being the span of the eigenfunctions e„ of corresponding to- strictly 

positive eigenvalues X„ > 0. The new covariance operator & of £,a is the 
restriction of in H / and is positive definite. With this in mind, we shall- 
assume from now on that ^ is positive definite and A n = cr? > 0. 

From the Moore-Aronszajn-Loeve theorem [Parzen, 1961, p. 965), 
we know that the covariance function r(t, s) generates a unique Hilbert 
space H(k) of which k(t, s) = r(t, s) is the reproducing kernel. 

We shall next determine the structure (inner product) of H(k) and 
its relation to the Hilbert space II. 


£ ~ 4a + £e 


where 


and 


■ £ 

n, A n > 0 


£ 

n, X n = 0 


It can easily be shown that 

, * ,.w * 1 

E[ x (O y (4 ) 3 = 
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If £ n is an orthonormal system in H(k) and since k(t, s) is the 
reproducing kernel, we must have 

' k(t,s) = 2] a® e tt (t) e n (s) = £ e n (t) <r n (s) (3.158) 

n n 

We can therefore obtain an orthonormal system in H(k) by setting 

£ n — a n n 1,2,... (3.159) 

Let {, g£H have Fourier expansions 

f = £ < f ’ e n > e » = £ -f» e n , . S = Y. gn e « 

n n n 


so that 


< f, g > ■ = f n g n (3.160) 

H 

n 

If, in addition, f, g £ H(k), we have 
< f, g > o < £ f n e n , £ g k e* > 

H(k) H(k) 

a k 

** . §k a n £ n » a k € k > - 

n k H(« 

" EE *n gk 6 nk = 2 f n gn < 3 ‘ 161 > 

n k n 

We have thus established a definition of inner product for elements of 
H(k) that also belong to H 
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< f, g > = y a n 3 < f, e n > < g, e n > (3.162) 

H(Jt) H « 

n 

We will next show that minimum variance (probabilistic) prediction in the 
sense-of- Section- 3.3. T is equivalents deterministic prediction in a Hilbert 
space in the sense of Section 3. 3.2. In both cases the prediction is given 
as a linear combination of the vector of observations d, of the form 

d P = C T P C' 1 d (3.163) 

The difference is that in the probabilistic approach 

* * Ot) * * 00 

c u = ECMi ) 1 € )3, [Cfli = E{ii (4 )ip(4 )} 

(3.164) 

where 1*, 1* are observation functionals and 1* the prediction functional, 
while in the deterministic approach 

C t j = < l lf lj >- , [C P ], = < r*,ip >- (3.165) 

where li, 1 j, 1 P are the representers of the 1*, if. Ip functionals in some 
Hilbert space H. 

If H is taken to be the Hilbert space H(k) with reproducing kernel 
the covariance function r(t, s) of the stochastic process £(t, w) involved 
in the probabilistic approach, it will be. shown that 

« (M «“)!*(«“)} - < 3 - 166 > 

H 

and consequently the results of both prediction approaches are identical'. 

To establish the above equality, let x*, y* E II* be two arbitrary 
functionals in H with representers x, y in H and x , y in H(k), defined 
as 


x*(f ) - < f, X > 

H 

< f, x k > 

H(« 

(3.167) 

y*( f ) = < f» y > 

H 

< f, y k > • 

«00 

(3.168) 
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for every f £ H and also for f £H(k). 

Let the Fourier expansions of f, x, y, X*, y 5 ' in H (assuming x*, 
y k £ H also) be 



y 


Y y n e n , 

n 


X 


k 



0 n > 


Y y n en 


n 


n 


Then, using the definition of inner product in H(k), we have 


< f,X > = Y, fn X n = < f. 


X k > 


H(lc) 


Y a » S f » x * (3.169) 


and similarly 


^ fn y n = Y a ” 3 f ° y “ 


(3.170) 


It follows that 

X^ — 0"n Xn > yn = ^n y n 

and, consequently, 


(3.171) 


^ k k 

< x , y > 

H{k> 


Y <*** *» yn = 2 ^ (^X n )(a^ y „) * 


o n x n y 


(3.172) 


On the other hand. 
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„ * CO * . /*\ 

E { X ( £ ) y (£ ) } 


E{ < x > < £ w ,y > } 

H H 


. CO 


E ( ( E*. t.) ( €» ) I » 

n k 


E E X * y It E C £n Sk 3 


EE X « yk <*n 6 nfc 


n k 


n k 


Z a » x * y > 


(3 . 173) 


The desired equality follows directly: 


jjj 

E{ x (Z) y (4) } = < x k , y k > 

H(k) 


Z a * x » y > 


(3.174) 


Let l 2 denote the space of square summable sequences fi, f 2 , . . . , 


with 


2 fn < ® ( 3 * 175 > 

n=l 

Then l 2 is isometric to the original Hilbert space II, by the Riesz- Fischer 
theorem [Kolmogorov and Fomin, 1970, p. 153]. 

The space l 2 of sequences fi, £ 2 , ... with 


X a; 2 f* < - 

n= 1 


is isometric to the Hilbert space H(k) . 


(3.176) 
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Let us formally consider sequences as infinite vectors 
f = [fi,f a .... 1 T , i = [gi,g 8 , ... ] T (3.177) 

and also the infinite dimensional matrix P with p n = ctH 2 


P = 


Pi 0 
0 Ps 



(3.178) 



We can now formally write the inner products in H and H(k) as 


< f. g > ■ g T f , 

H 


< f, g > 

H(k) 


i T p f • 

(3 . 179) 


This establishes a sort of analogy with inner products in finite dimensional 
spaces appearing in linear least squares adjustment techniques. An 
arbitrary choice of P gives a "weighted least squares" deterministic 
solution. An optimal choice can be made when probabilistic reasoning is 
used and P is taken to be the inverse of an infinite dimensional matrix S 
with elements Sij = o? 6ij. S can be identified as the covariance matrix 
pf the infinite vector 


t - [&, 4s. ... 1 T 


of the Fourier coefficients of the corresponding stochastic process 
4(t, co) 

S = E{? I T } .(3.180) 
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3.4 Minimum- Error Bound Prediction 


As shown in the previous section, collocation solutions become 
probabilistically meaningful when the inner product is linked to the known 
covariance function of the stochastic process serving as a model of the 
underlying unknown function. In case such a probabilistic reasoning is not 
justified, or even when we just have no knowledge of the covariance function, 
a prediction can still be carried out with a more or less arbitrary choice of 
inner product, corresponding to a ''model covariance function." The result- 
ing variance of the prediction error is only a "model variance" and should 
not be used in drawing statistical inferences on the prediction. When 
probabilistic reasoning does apply and the true covariance function is 
known, then the prediction error variance answers the question: "How 
good is the prediction?" In a strictly deterministic context, a similar 
answer must be found if the prediction is to be of any use at all. We shall 
try to answer this question here. 

In a deterministic solution we cannot use variances (even though 
"model variances" can be computed), but a concept believed to be more 
legitimate is that of a bound of the prediction error. The error certainly 
remains unknown in general, but we might be able to ascertain that it does 
not exceed a certain bound in absolute value. 

Let us first see what a "model variance" of the prediction error 
corresponds to. Probabilistic prediction using a model covariance function 
k(t, s) corresponds to deterministic prediction (collocation) in a Hilbert 
space H(k) with k(t, s) as its reproducing kernel. The model variance of 
the prediction error c P = 1 P (£) - 1 P (£ ) is given by 

af - E { [ 1* ($“) - 1* (£“) f } (3.181) 

where 1 P is the functional corresponding to the prediction and 1 P is its 
projection on the data space M* (the span of the functionals corresponding 
to the observations) . 
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As we have already established in the previous section 


a 

CT P 


a 

k 


= E{[it(4“)-ip(4 w )f } = <ip-ip,i P -ip> = iUp-4 

(3.182) 


* ^ * 

where 1 P , 1 P are the representers of 1 P , 1 P in H(k). 


Suppose that in general we are given a set of observations corres- 
ponding to functionals It -on an unknown deterministic element £ E H(k), with 
representers li in H(k); and we want to predict the value of a functional 1* with 
representer 1 P in H(k). Knowledge of the values of the observations 


di = 1* (£) = < £,■ 1 4 > 

H(k) 


(3.183) 


allows us to find the projection £o of £ on the data space M = span (li, i = 1, 
2, . . . , n). If x is an arbitrary functional with representer x, we cannot 
in general compute x* (£) except when x E M. In this case, using the 
decomposition 

v ' - * * 

£ = €o * £' , £ 0 E M, £' E M 1 (3.184) 

we obtain 

x*(£') , = <£',x> = 0 (3.185) 

H00 

since £' .-l x , and . . 

x*(£) = x*(£ 0 +£') = x* (£ 0 j (3.186) 

We can compute x* (£o), since £o is known. 

This leads us to seek a functional x with representer x in M, so that 
we can compute 

• ' dp = x*(£) ’ - (3.187) 

as an approximation to the true but unknown value 

dp = 1 P *(£) (3.188) 


For any x E M, the prediction error is 
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(3.189) 


€ P = dp-dp = 1 P (£) - x (£) = <1 P ,4> _ <x, £ > 

HpO «W 

= < 1 P -x, £ > 

H(kl 

Using the Schwarz inequality [Davis, 1963, p. 159], we obtain 

| € p | 3 = | < 1 P - x, £ > I 2 £ ||l P -x|| 2 II C Ilk < 3 * 190 ) 

H(k) 

s r 1 

and 

|€ P | * If I, < 3 - 19I > 

We have thus found a bound for the absolute value of the error of prediction. 
An obvious criterion of prediction optimality is to try to make this bound as 
small as possible. £ is unknown but fixed and consequently |j £ || k is fixed. It 
remains to minimize || 1 P - x j| k , i.e. , to find an element xEM, which satisfies 

|| i P - x |U = min || 1 P - x || k (3.192) 

x E M 


The solution is well known to be 

x = Ip « P M ( 1 P ) (3.193) 

This choice differs only in motivation from the collocation approach where £ 
is approximated by £ 0 = (£), and then the value of any functional 1* (£) is 

approximated by 1* (£ 0 ). This follows from the fact already established in 
Section 3.3.2, i.e. , 
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(3.194) 


M4o> - <ip»P M (€)> w < -Pm ( 1p) ’ * > m 

= < e* ip > = 4(0 

H(fc> . . 

This approach explains why and in what sense the approximation £o to £ 
provides optimal estimates 1 P (£o) of quantities 1 P (4). The optimality 
criterion is minimization of the error bound 

' ’ II Ip - Ip Ik II ^ II k 

If in some way it is possible to obtain a bound for the norm of £ 

|| £ || k * M k (3.195) 

and computing 

- ‘ o* = 1| 1 P - lp||g (3.196) 

we have a bound 

| e P | ^ m k f, °p Mk . .. (3.197) 

When, therefore, a "model covariance function" is used, the computed "model 
variances" of prediction errors must be multiplied by the number Mk to obtain 
the square of the bound for the absolute value of the prediction error. 

Obviously Mk depends on the inner product in H(k) and consequently on 
the used model covariance k(t, s) . 

We shall call Mk the "covariance model error bound number." We 
shall later return to the problem of finding Me in the case of predictions 
related to the gravity field of the earth. 
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3.5 Kahnan-Bucy Filtering 


A linear filter is a transformation L of functions x, y: T R, TCR, 
with two properties: linearity, 

L («x + £ y } = <*L {x } + £ L {y } (3.198) 

and "time" invariance, if 


y(t) = ( Lx ) (t) 

x'(t) = x(t + T) = (U T x ) (t) 

then 

y = Lx - LU x = U y = U Lx 


(3.199) 


(3.200) 


and LU = U L, i.e. , the filter commutes with the "shift" transformation 
T T 

U . 

T 

We shall consider here linear filters having a time domain repre- 
sentation in the form of a convolution integral 


+ 00 


y(t) = (Lx) 


(t) 


I h(u 


) x( t - u ) du 


(3.201) 


The filter is completely determined by the kernel h(u), called the impulse 
response function of the filter. 

Linear filters may also be formally considered to transform stochastic 
processes x(t, to), y(t, to). A more rigorous discussion of some particular 
filters will be undertaken in Chapter 5. In the following discussion, we 
consider only stochastic processes; and we can therefore suppress the 
variable co without danger of confusion. 

Given an observed outcome of a stochastic process y(t) related to an 
unknown stochastic process x(t) to be estimated, the filtering problem is to 
find an impulse response function h(u), such that the output of the related 
filter with input y(t) 
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+ CO 


x(t) = (Ly) 


(t) 


/ h(u 


) y(t-u)' du 


(3.202) 


best approximates the signal x(t), in -the sense of minimizing 


E {[x(t) - x(t)] 3 } for every t G T. (3.203) 

Since the observed function. y(t) cannot be known for future values, we must 
restrict the admissible impulse response functions to those with h(u) = 0 for 
u < 0, for the related filter to be physically realizable. 

For weakly stationary processes x(t), y(t), with 


E { y(t) y(t+ t) } = l/)yy ( T ) 

E{ X(t) y(t+T) 3 - 0xy(T) 


(3.204) 


the answer to the minimization of E { [x(t) - x (t)] 2 } is given by the Wiener- 
11 opf equation 


0xy ( T ) 



h(s) ij)yy ( T - S ) ds 


(3.205) 


The concept of a process y(t) continuously observed over all its past history 
is clearly a mathematical idealization. Consider instead a vector y of a 
finite number of observations 


Yi = y(ti) - i = l, 2,- ..., n • • (3.206) 


and the estimation of the value of x(t) at a certain epoch t P , x P = x(t P ). The 

4 t * 

filter equation 


x P 


J h(u) y(t P -u) du 
0 


(3.207) 
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should be replaced by a summation 

x P = L ^tp-tj) y(tj ) = S hj y t = h T y (3.208) 

i i 

In a similar fashion, the Wiener-Hopf equation should be replaced by 


$*y(tp-t,) = E h(t p -t t ) $ yy (t t - tj) (3.209) 

or in matrix notation 


C yy h = C yx (3.210) 

where 

[Cyyhj = = E { y(t t ) y(tj) } 

(3.211) 

[CyJi = ^ X y(tp -t t ) = E'{ y(t { ) X(tp) } 


and finally 

x P = C y T x Cy y y (3.212) 

This is exactly the formula for linear least squares (minimum variance) 
prediction. We can consider the filtering problem and the Wiener-Hopf 
equation as a generalization of minimum variance prediction for continuous 
observations. 

Wiener [1949] and Kolmogorov solved in the early 1940’s the problem 
of filtering a signal additively corrupted by noise, under the assumptions of 
stationarity, ergodicity, and knowledge of the entire past of the observed 
process . Wiener's results were expressed in the frequency domain and 
could not be directly extended to the nonstationary case. The work of 
Kalman and Bucy extended Wiener's results to the nonstationary case and 
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a process observed over a finite time interval only [Kalman, 1960; Kalman 
and Bucy, 1961]. 

Kalman and Bucy considered the vector process to be estimated to be 
the state x(t) of a dynamical system described by a linear differential equation 

dx(t) 

— — = F(t) x(t) + G (t) u ( t ) , . x(t 0 )=x 0 

dt (3.213) - 


and an observed process- 

z(t) = H (t) x (t) + v ( t) (3.214)- 

where u(t) and v(t) are white noise zero mean stochastic processes with 
covariance matrices 


E { u(t) u T (s) } = Q(t) 6 (t-s) 

E [ v(t) v T (s) } = R(t) 6(t-s) 

E [ u(t) v T (s) } = 0 


(3-. 215) 


where 6 (t - s) is the Dirac delta function. x 0 is a random vector with 

E-.{x 0 }=m 0 , E {(x 0 -m 0 ) (x 0 -m 0 ) T } = Q 0 , E [x 0 v T ( s ) } = 0. 

(3.216) 

The solution of the filtering problem x(t) is the minimum variance estimate 
of x(t) based upon the observed process z(t) over the interval [t 0 , t] and is 
given by the solution of the following differential equation. (The most 
. compact notation x t , F t , etc', is introduced in place of x(t) , F(t), ... .) 

dx t 

= F t xt + K t ( z t - H t x t ) x(t 0 ) = m # 

dt 

(3.217) 
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where 


K t = P t Hj R\ , P t = E{ (x t ~x t ) (x t -x t ) T } (3.218) 
.- and the error covariance matrix Pt is the-solution of 


1 T t - 1 t 

= F t P t + Pt F t - P t H t Rt H t P t + G. t Qt G t , P 0 = Qo- 

dt • ' (3.219) 

The continuous observations case is hardly of any interest in geodetic 
problems where observations are discrete and finite in number. With further 
reference to [Bucy and Joseph, 1968; and Sage and Melsa, 1971] for more 
details, we turn our attention to the case of discrete observations, 

y(t t ) = H(tt) x ( t{ ) + v i , i-1,2, . . ,n; E^Viv/} = R t 6 n . 

(3.220) 

A solution for minimum variance estimates x (ti) of x(tt) can be obtained with 
the use of least squares adjustment techniques, if the means and covariances 
of the random variables x(ti) were known. Estimates x(t) for epochs other 
than the observation epochs ti can be obtained from x(tj) by least squares 
prediction techniques if the mean and the covariance function of the signal 
x{t) were known. The required first- and second-order statistics of x(t) 
can be obtained from those of u(t) and Xq. The solution to the state differ- 
ential equation 

dx t 

= F t x t + G t u t (3.221) 

dt 

is of the form 


t 

x t = *(t,t.) x 0 + J *(t,€) G^ d$ . 

t 0 (3.222) 
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The definition of the above integral raises some problems that we shall 
examine in Section 5.2 . (t,, s ) is the state transition matrix obtained 

by solving the differential equation 

~ $(t,t 0 ) = F(t) $(t,t 0 ) , $(t 0> t o ) = I (3.223) 

and using the transition property 

$(t,t 0 ) = $(t,s) $(s,t 0 ) (3 . 224) 

The required mean and covariance functions are 

'm t = E{x t } =" $(t,t 0 ) m 0 (3.225) 

C xx (t>s) = E {(x t -m t ) (x 8 -m 8 ) T } = 

min(t, s) 

= $(t,t 0 ) Q 0 <l> T (s,to) + J <i>(t,£) G T ^ $ T (s,£)d£ 

t 0 (3.226) 

Suhc a global approach (in the sense of processing all observations together) 
shows the relation to least squares techniques but has great computational 
disadvantages due to the necessity of inverting large matrices. The alterna- 
tive is a sequential solution where x (ti | ti), the minimum variance estimate 
of x(ti) based upon past observations y(tj) (j - 1 , 2, . . . , i), is obtained from 
the similarly defined estimate x (t t _a j ti-i). It is possible to obtain a global 
solution for x (t t | ti) using observations up to epoch ti only, and a similar 
global solution for x (ti_x | tj_i), and then show the relation between the two. 
However, this involves an enormous algebraic effort, and it is much easier 
to derive directly the one -step solution for obtaining x (tj | ti) from x (ti-i | 

ti-i). 
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First a discrete state model can be obtained 


Xi+i - $(i+l,i) xt + w 1+1 


by setting 


Xi = X(tl) 

$(i, j) = ®(ti. tj), and 


(3.227) 

(3.228) 

(3.229) 


W i 



$(t lt £) G(£) u(£) d$ 


(3.230) 


wi is a white sequence with E [wi } = 0 and E {wj wj } = Qi 6j j where 


Qi 


j <&(t,,S) G(£) Q(£) G T (£) $ T (t t ,€) d$ . 
tj_i (3.231)- 


A somewhat more general discrete state model is of the form 


x t+i = $(i+l, i) x t + T\ w j+i . (3.232) 

x (i + 1 | i+1) = x (ti+i | t 1+i ) can be obtained from x (i | i), the covari- 

ance matrix P(i j i) of the error in x (i| i), and the observation 

' < f 

y 1+1 - H 1+1 X I+ 1 + • v 1+ x (3.233) 

with the help of the following algorithm [Jazwinski, 1970, p. 270]: 
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x(i + l | i) = $(i+l, i) x(i |i) 

t 

P(i + l|i) = *(1+1, i) P(i | i) <f> T (i+l, i) + r,Q 1+1 vl 

K i+l = P(i+1 l i) H 4 T +1 [ H 1+1 P(i | i) Hj T + 1 + R 1+1 f 1 

x(i+l | i+1) = x(i+l | i) + K 1+1 [ y 1+1 - H 1+1 x(i + l |i) ] 

P(i + 1 j i + 1) - [ I - K 1+1 H 1+1 ] P(i + l|i) = 

= [I-K 1+1 H 1+1 ] P(i + l|i) [I-K 1+1 H t+1 } T + K I+1 R 1+1 Kj T +1 

(3 . 234) 

We shall next derive this algorithm with the use of the least squares 
adjustment (condition equations) technique. 

Consider that an a priori unbiased estimate Xi of xi is available, 
and Ci is the a priori known covariance matrix of the error Xj - Xi. An 
a priori unbiased estimate Xi +1 of xt+i and the a priori covariance matrix 
Ci+i of the error 6xi+i = x 1+ i - x 1+i can be obtained by simple propagation, 


x 1+ i = $(i + l,i) x t 

c i+i = §<i+i,i) c t $ r (i+i,i) + r, q 1+1 rl 

The observation equations can be rewritten in the form 


(3.235) 


1 i+i = y i+i - Ht+i x 1+1 - H 1+1 6x 1+1 + v 1+1 - [ H 1+1 I ] 


6Xi +1 

v 1+ i 


(3.236) 
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where lj +1 is known and 6 xi +1 has zero mean and covariance Ci+i. Our model 
is of the condition equation type B V = W (see Section 3. 2. 2) and the 
weight matrix of the zero mean vector 


- 


-X 

6x 1+ i 


C 1+1 0 


is P - 

-1 

Vi+l 


0 Hi+i 



J 


(3.237) 


The solution is of the form 

V * P 1 B t M 1 W (3.238) 

with M = BP 1 B r . Applying this to our case, we obtain 


M = [ H 1+ i I ] 


— 




C i+i 

0 


H j T +i 

0 

H i+i 


I 


C 1+1 H 1+1 + R 1+ j 

(3 . 239) 


6xi+i 


C i+i 

0 

Vi + l 


0 

Ri+i 


M' 1 1 1+1 


(3.240) ’ 


Setting Kt+i = C t+ i Hl +1 M' 1 , we obtain 

i+a ” K 1+1 1 i+i — ^l+i ( y i+i - Hi+i x i+i ) 

(3.241) 

x (i+1, i+1) = x 1+1 + 6x 1+1 = x 1+ i + k i+i ( y i+i - H 1+1 x 1+1 ) = 
~ Kj+i y i+i + ( I “ Ki+i H 1+ i ) x 1+1 (3.242) 
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The covariance of yi+i if Ri+i, and by simple propagation the covariance of 
the error in x (i + 1 j i+1) becomes 

P (i+1 1 i+1) = K 1+1 R 1+1 K, T +1 + 

+ [ I " K 1+1 H t+ x ] C 1+x [ I - K 1+1 H 1+1 ] T 

(3.243) 

Identifying x t , xi+i, Ct, Ci+i with x(i | i ), x(i + 1 j i ), P(i j i) and P(i + 1 1 i), 
respectively, the algorithm follows directly. 

This solves the filtering problem of finding an estimate x (i | i ) of x x 
based upon past observations only. After x (n|n) is obtained, the estimates 
x (i| i) must be updated for the effect of future observations yt+i, yt+s, ...» 
y n to obtain x (ij n). This is the smoothing problem, and finding x(t| n) for 
any other epoch t is the prediction problem. Refer to [Liebelt, 1967, 

Section 6-8] for smoothing algorithms . 

Our main point here has been to show that for the case of discrete 
observations Kalman-Bucy filtering techniques are equivalent to the familiar 
least squares (minimum variance) adjustment methods. 
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4. DETERMINISTIC AND STOCHASTIC MODELS OF 
GEODETIC PROCESSES 

4. 1 Introductory Remarks 

Physical processes related to geodetic work can be loosely divided 
into three categories. The first two correspond to the traditional objectives 
of geodetic research: Determination of the gravitational field and the shape 
of the earth. 

The first category thus includes the gravity field of the earth and 
processes that result from transformations of the gravity field such as 
gravity anomalies, geo id undulations, etc. 

The second category includes processes related to changes of the 
earth’s geometric shape with time (station drifts, earth tides, etc. ) as 
well as to changes of the earth’s position in inertial space (precession- 
nutation, polar motion, variations in rotational velocity, etc.). We shall 
coliectively call such processes ’’earth motion processes. " By similarity 
one might include in. this category processes related to the motion of other 
celestial bodies (lunar theory and librations, planetary motions, etc.) 
which might be involved in geodetic work. 

The third category includes "noise processes, " i.e., processes which 
although of no direct interest (to the geodesist at least) still appear in experi- 
ments directed towards the determination of geodetic parameters and processes. 
Having in mind current techniques for obtaining geodetic data, we might 
mention atmospheric refraction, nongravitational accelerations acting on 
artificial satellites, and, of course, observational noises associated with the 
observing instruments themselves. 

The role that modeling of physical processes plays in the estimation of 
geodetic parameters is directly proportional to the accuracy of the available 
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observational techniques. In the presence of large observational inaccuracies, 
simple models, or even complete omission of the effect of the process, may 
be of no significant consequence on data analysis. In fact, advances in obser- 
vational accuracy made possible the discovery or identification of processes 
whose existence had been preestablished by theory. 

Recent advances in observational techniques necessitate the use of 
more sophisticated models if the objective of a "centimeter level geodesy" is 
to be realized . 

Among models for processes which appear to be of critical importance 
in view of present day observational techniques, we shall outline here the two 
which we consider most important — the gravity field and the rotation of the 
earth. 

4. 2 The Gravity Field of the Earth 

4.2.1 The Model 

The gravity field of the earth is usually divided into two parts: a 
reference field (normal gravity potential) and the anomalous or disturbing 
potential. The disturbing potential T is known to belong to the class of 
functions harmonic outside the surface of the earth and regular at infinity 
(disregarding or including the effect of the atmosphere in the reference field). 

It is also known from Range's theorem (see [Krarup, 1969]) that T may 
be approximated arbitrarily well, in a certain sense , from members of the 
class of functions harmonic outside a sphere contained in the earth's interior 
(Bjerhammar sphere), and also regular at infinity. Such functions may be 
expanded into a series of spherical harmonics 

00 n 

f(p) = S fn ° 6,18 <p> {4A) 

n“0 

where 
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e nB (P) 


m s 0 



(n-|m 

Lli 

(n+| m 

1)'. 


cos j m j X P 
sin | m | X P 


Pm mi (COS 0 p) 

m < 0 
.(4.2). 


where r P , X P , 0 P are the spherical coordinates of the point P, R is the radius 
of the Bjerhammar sphere, and P nm (cos 0 P ) are associated Legendre polynomials. 
In view of the relation 


enm (P) e>a (P) d°> - 6 nk 6 nl (4.3) 

S R 

(Sr denotes the surface of the sphere of radius R), we can consider a Hilbert 
space of potentials H, with inner product 



<f, g> = j f(P) g(P) dor P , f, g£H (4,4) 

Sr 

and with enm (n = 0, 1 , . . . ; m = -n, . . . , -1, 0, 1 , . . . , n) as an orthonormal 
basis. (For a more rigorous discussion see [Lauritzen, 1973, Chapter 2].) 
Two functions with Fourier expansions 


f(P) = E f nm e n m (P) ' and g(P) = ^ ^ gnmenm(P) (4.5) 

. n , n n, m 

have inner product 


co n 

< f, g ^ ^ ^ ^ Sam 

11=0 , m=-n 


(4.6) 


In view of Dirichlet’s principle [Heiskanen and Moritz, 1967, p. 18], we can 
consider the restrictions of potentials on the surface of the Bjerhammar 
sphere, constituting a Hilbert space H with the above inner product. 
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The disturbing potential T(P) can now be modeled as a random field 
T(P, co) with sample functions T*° (P) in H or, alternatively, as a Hilbert- 
valued random variable with values in H. The Fourier coefficients T am of 
the expansion of T in H are random variables 


oo n 

t(p, co) = 2^ 2^i Tnn(w) en " (P) 

n=0 m=-n 


(4.7) 


An additional restriction is that T(P, co) should be a statistically homogenous 
(isotropic) random field, i.e., that for any set of points Pi, i = 1, 2, . . . , k, 
the joint probability distribution of the random variables T(Pi, co) is identical 
to that of the random variables T(Qt, co), where 

r = Mr (4.8) 

Qt Pi 


r_ , r_ are position vectors of points Qi, Pi, and M is a matrix repre- 

Qi Pi 

sentation of any rotation on the sphere (MM T = I, det M = 1) . Obukhov [1947] 
has shown that for the random field T(P, co) to be isotropic, we must have 

E-{T nB T k i} = o\ 6 nk 6 0l (4.9) 


This implies that the admissible covariance functions for isotropy are those 
represented by a covariance operator H H, having (P) as its eigen- 
functions and with corresponding eigenvalues <j n 2 n independent of m 

^ew B (P) - or 2 ^(P) (4.10) 

Any appropriate sequence of "coefficient variances" ff n (i.e. , such that 
T(P, co) is a second-order random field) defines a corresponding covariance 
function 


r(P, Q) = 6nm (P) (Q) (2n+l) Pn(COS^pq) 


(4.H) 
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r(P, Q) 


(4.12) 



Cn Pn(COS^(K}) 


r# pq) 


We have -used -here the -addition formula for spherical harmonics [Muller, 1966, 
p. 10], and we have introduced the "degree variances" c^ [Heiskanen and 
Moritz, 1967, p. 257], is the angular distance of the two points P and Q. 

Lauritzen [1973] has proved that such a random field, under the 
restriction of Gaussianity, is not ergodie; and therefore its covariance 
fa notion cannot be found through sampling by taking averages over the sphere. 
In his words [p. 80]: - 

.... This means that, even if we knew gravity all over the 
earth, we would not be able to find the true value of the 
covariance function. . . . Somehow the problem is not 
suited for statistical treatment. . . . 

However, one can still use a "model covariance" r(P, Q) and interpret the 
algorithm as deterministic prediction (collocation) in a I-Iilbert space H(k) 
with r(P, Q) as its reproducing kernel. We have already shown how such 
an approach can be motivated from a "minimum error bound” point of view. 

The application of this concept to prediction problems related to the gravity 
field of the earih leads to criteria for the optimality of the model covariance 
function and thus opens the way for the solution of the very important 
problem of the choice of norm (inner product) for the Hilbert space H(k) of 
potentials . 


4.2.2 Application of Minimum Error Bound Prediction in Gravimetric 
Geodesy and the Optimum Norm Choice Problem 

Suppose that we want to predict some quantity related to the earth's 
disturbing potential T from observations also related to T. We shall 
symbolically denote this problem by the triplet (D, p, k) where D stands for 
the set of observation functionals, p for the prediction functional, and k for 
the covariance-reproducing kernel of the I-Iilbort space II(k) in which the 
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prediction is taking place . The solution algorithm leads to a model variance 
ct 2 of the prediction error . We have found that the prediction error is 
hounded by 

M * a || T ||k (4.13) 

where || T ||k is the norm of T in H(k) 


1 T Jlk ~ (4 . 14) . 

n t m 

T nn are the coefficients of the Fourier expansion of T not in H(k) .but in the 
Hilbert space H with orthonormal basis e nB . a 2 are the coefficients in the 
expansion of the reproducing kernel of H(k) (model covariance function) 


r(P, Q) 


k(P, Q) 



©no (P) ©na (Q) 



a 2 (2n+l) P n (cos (|)pq) 
(4.15) 


If we could find a bound for each coefficient 


I T nm j ^ Bnm 

then a bound Mk of || T |jk can be found: 


(4.16) 




(4.17) 


A proper place to look for bounds of the potential coefficients is the density 
function p of the mass of the earth which, in the first place, gives rise to 
the potential itself. 

This has already been done by Cholshevnikov [1965 and 1968] . The 
bounds refer to the total harmonic potential of the earth: 


V(Q) 


oo n 


n =0 m=*~n 


v nB 


©an (Q) 


(4.18) 
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In our notation these bounds are (see also [Payne, 1973]): 





4 77 Gpmx 
(n+l)\j2n+l 



n+1 


(4.19) 





8 IT G pnax 
(n+1) yj2n+l 



(n -m) 
(n+m) 


m > 0 (4.20) 


where p m y. is the maximum density of the earth, G is the constant of gravita- 
tion, R is the radius of the Bjerhammar sphere, and a max is the maximum 
distance of the earth surface from the geocenter. Cholshevnikov also gives 
bounds better than those above which, however, depend on the maximum 
variation of density with longitude (X), v^, nax , and with colatitude ( 0 ), 
vt, tax (t =cos 0 ) as follows: 

i i 

4G (22)^ (7T) (2 Vt, Bax Pb ax) / ^Bax \ 

fVnol * , (4.21) 

(2n+l) (n+ 1) \(2n + l) (n- 1) \R / 


I V n , 0 
I Vn,- m 


£ 


8 G V X> Bax 

■^2n+ 1 (n+ 1) m 


/aBax\ n+1 /(n-m)’. 

\ R / M(n+m)\ 


m > 0 


(4.22) 


Using the known coefficients of the harmonic part of the reference potential, 
we can compute bounds for every coefficient T affi , find the bound M* of the 
disturbing potential norm,. and finally the bound of the prediction error 

|e| £ m k = 0 M k (4.23) 

This bound is independent of the observations and neglects the information 
about T contained in the data 

di = <T,l k i> H(k) i = l,2 n (4.24) 

To take this information into account we decompose T as 
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T = #m(T) + ^h-(T) - To + T' (4.25) 

where M is the span of the observation functionals (data space) and M 1 its 
orthogonal complement in H(k) . 

The prediction error now is 

€ = <lp ~lp, T 0 + T ; > = <$-!*, T'> (4.26) 

H(fc) H(k) 

since l k P - Ip J- To; and 

| € | = | < lp - 1 p , T* > | * llp-^plU || T' || k = ff|| T' |[k (4.27) 

H(k ) 

where lp is the representer of the prediction in H(k), and lp is its projection 
on the data space M. This is a better bound, since from the Pythagorean 
theorem 

||.T |k = | To ||k + II T' $ and || T' || t s || T I* (4.28) 

The new bound for the prediction error becomes 

|e| * m k = a fig -||T 0 |g (4.29) 

The error bound, of course, depends on the choice of kernel in H(k), i.e., 
on the choice of a model covariance function. 

—If ki and k 2 are two kernels giving rise to two error bounds 
m^, we say that kernel kj. is better than kernel kg for 
predicting a certain quantity from a certain data set if mj^ 

< “kg- 

We have thus found a tool for comparing kernels, and the definition of the 
optimal reproducing kernel (or optimal, inner product or optimal model 
covariance function) follows directly: 

—We say that a kernel ko is the best kernel for predicting a 
certain quantity from a certain set of data if the corresponding 
error bound mo satisfies 

mo(D, p, ko) = min m(D, p, k) (4.30) 

keJi 
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where m(D, p, k) denotes the dependence of the error bound on the obser- 
vations D, the prediction p, and the kernel k. denotes the class of all 
"permissible" kernels. The class JC is difficult to define. An obvious 
-necessary condition on members of is that the corresponding bound of 
the disturbing potential norm is finite 

00 

= /] cr n 3 

n — 0 m--n. 


E 


Run 


< 00 


(4.31) 


It is more reasonable to look for the optimal kernel in a class of kernels 
convenient for computations. Such kernels must be given by closed expres- 
sions rather than by an infinite sequence cr 2 . One could even try to find a 
closed expression including a finite number of parameters, such that the 
family of kernels corresponding to different sets of parameter values is 
"broad" enough in some sense. Then the optimal set of parameters could 
be found as the one giving the smaller prediction error bound. However, 
the dependence of the bound on the kernel is quite complex: 

m(D, p, k) = CT(D, p,- k) { M 2 (k) - || To l 8 (D, k) } 2 (4.32) 


where M 2 depends on the kernel, || To || 2 on the kernel and the observations, 
and a on kernel, observations, and prediction. More specifically 

<r 2 = Co - Cp C' 1 Cp, Co = < lp, lp > (4.33) 

H(k) 


with 

(Cp), = <l k t ,lp> , 

H(k) 


Ctj = < l 1 


l* kf 


t > 


HOO 


(4.34) 


where 1* , lp are the representers of the observation and prediction 
functionals in H(k) . We also have 


n 

a, 1* (4.35) 

l= 1 



where the coefficients a, are found from the solution of the normal equations 
(see Section 3.3. 2) 
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a 


(4.36) 


It follows that 


= C' 1 d 


I To || 2 


n n 



i= 1 J ~ 1 


a T Ca 


d T C' 1 d (4.37) 


The final equation for the prediction error bound is 

i* 


m 


= [ C 0 - cj C 1 C P } : 


£ 

n, n 


a; 2 Bnm - d T C 1 d 




(4.38) 


Another point to be made is that one is usually interested not in one but in a 
number of predictions, and the idea of applying a different optimal kernel for 
each particular prediction is not appealing, in view of the computational effort 
involved. In this case, one might instead introduce a risk function, e.g. , a 
nonnegative function L(m) of the vector of prediction error bounds m, with 
the following properties [ Jazwinski, 1970, p. 147]: L(0) = 0, L(Ui) a L(u s ) 
s 0 for | Ui | 2 : [ u 3 1 , where | u | = (u T u) 2 . The optimal kernel may now be 
defined as the one minimizing the risk function rather than any of each 
individual bounds . 

The objective of our discussion here has been not to give any final 
answers, but rather to provide guidelines and motivation for more work on 
the question of norm optimality in gravimetric collocation. An interesting 
and motivating discussion on this problem can be found in [Eeg and Krarup, 
1975, especially Section 5]. 

Our approach is novel and, what is more important, completely 
independent of any probabilistic reasoning. The findings of Lauritzen 
(nonergodicity) stand in support of a purely deterministic approach such 
as ours and have actually motivated our work. Our criterion of prediction 
accuracy (error bound m) shares the same nice asymptotic behavior with 
probabilistic techniques: If the prediction functional approaches tho data 
space M (in the sense that |j lp - (lp) ||it 0), then the variance of tho 
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prediction, error tends to zero (ct 2 ■* 0), and also m •* 0 since cr is one of the 
product terms in m. 

In a probabilistic technique, the variance ct 2 is independent of the 
actual outcomes of the observations’. It only depends on what we have 
observed and what we want to predict. In our approach m depends on the 
outcomes of the observations through || To j|, the norm of the known component 
of the potential T (its projection on the data space). The closer T is to the 
data space, the larger || To ]| becomes, and the smaller the bound m of the 
prediction error. The main defect of our approach is the neglect of the 
effect of observational noise. An extension is obviously needed, either in a 
deterministic sense introducing bounds for the observational errors too, or 
by means of some combination of deterministic and probabilistic concepts. 

4.3 The Rotation of the Earth 

4.3.1 Choice of Reference Frame and Parameterization 
of Earth Rotation 

In this Section we are concerned with the modeling of the motion of the 
earth with respect to an inertial system within the framework of classical 
mechanics. Depending on the type of available observations, the inertial 
system is realized with the help of the dynamics of the solar system, the 
stars (taking care of proper motions), or extragalactic radio sources. The 
motion of the earth can be mathematically described by an infinite set of 
functions 

Xi(t), Yi(t), Zj(t) 

for all points i of the earth with inertial coordinates X lt Yi, Zi. In reality 
only the motion of points on the surface of the earth is directly observable, 
and practical reasons confine us to a finite system of points i = 1, 2, ...» 
n (network stations). The vector valued function 

X(t) - [Xi Yi Z n X;, Y a Za . . . X„ Y n Z„] T (t) (4 . 39) 
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gives the state of the point network for each epoch t and provides a descrip-; 

tion of its motion. j 

| 

Any arbitrary time dependent transformation matrix M(t) (i.e. , such 
that M(t) M T (t) = I and det M(t) = 1 for every t), together with a vector 
valued function 6 (t), defines a new moving reference frame x, y, z . The 
coordinates of the network points in the new system are given by 


yi 

L Zl J <t) 


= M(t) 


1 

>< pH 

L 

+ 6(t) 

Z*‘ 

(t) 

L J 


(4.40) 


or 

r, (t) = M(t) {R t (t) + 6(t)} (4.41) 

We can group all possible such frames into classes through the following • 
definition: Two given reference frames defined by 


ri(t) = Mj. (t) { R(t) + 6 1 (t) } 
r 2 (t) = M 3 (t) {R(t) + 63 (t) } 


are said to be equivalent or rigidly connected if there exists a constant 
transformation matrix Mi 2 and a constant vector 6xs such that 

r 2 (t) = M 12 £ rj.(t) + 6 12 } (4.43) 

If the earth, or just the network of points in question, were rigid, it 
would haye been possible to find a class of equivalent reference frames such 
that the coordinates of the points with respect to any of them would be time 
independent. For a nonrigid network, a reasonable choice is a class of 
equivalent frames such that the relative motions of the points are minimized 
in some certain sense, for example, 


E(/V<?) *•(«'• (*)’*) 


min 


(4.44) 
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We can, furthermore, eliminate translatory motions by moving the 
origin of the inertial frame to the geocenter, taking into account the corres- 
ponding effects on the observations (e.g.-, secular and annual aberration, 
annual parallactic displacement, relativistic effects). We can also identify 
a particular frame from the optimal class with a frame of the same class with 
origin at the geocenter and axes parallel to the former. By identification here 
we simply mean that the rotation of both systems are described by the same 
set of parameters, and we can therefore take advantage of the simplifications 
in the equations of rotation when referred to a geocentric frame . 

Alternative choices for a frame fixed, or rather attached, to a net- 
work of points on the earth can be based on physically meaningful geometric 
characteristics of the earth such as the geocenter and the principal axes of 
inertia. Such choices become relevant only when available observations are 
sensitive to such a physically appealing choice of frame as, for example, in 
die case of satellite observations in orbits governed by the gravity field of 
the earth. It is well known [Heiskanen and Moritz, 1967, p. 62] that the 
geocenter and principal axes of inertia are connected to the first- and second- 
degree spherical harmonic coefficients in the expansion of the attraction 
potential of the earth. 

Even for strictly geometric observations involving both "earth” 
points and "inertial" points (stars, quasars, points on the moon, etc.), the 
directions of the principal axes of inertia appear implicitly in the equations 
of the motion of the earth. However, the little sensitivity of present observa- 
tions to such natural geometric characteristics of the earth and the uncertainty 
present in the relevant equations of motion seem to justify the use of an 
"arbitrary" frame, especially since relative motions of network points can 
now be estimated to a comparatively high degree of accuracy. 

The problem of connection between an arbitrary frame and a physically 
meaningful one can be treated separately when sufficient observations for this 
purpose arc available. 



Our choice of an arbitrary "earth -fixed" frame from a properly 
defined optimal class of equivalent frames (more precisely its geocentric 
parallel from the same class) coincides with the concept of "geographic 
axes, " "attached in a prescribed way to the observatories, " as discussed in 
[Munk and MacDonald, 1960, p. 11]. 

The rotational time dependent transformation between the arbitrary- 
geographic geocentric frame-and the quasi- inertial geocentric frame can be 
described with the help of three parameters defining the transformation 
matrix M(t). Among possible choices, a traditional one is that of the Euler ian 
angles <p, 0, $ [Goldstein, 1950, p. 107] 

M(t) = Ra(!/)(t)) Ri(0 (t)) Rs (<P(t)) (4.45) 


where Ri, Rs are rotation matrices about the x and z axes respectively, 
and M(t) transforms inertial into earth-fixed vectors as follows 


Xi 


V 

yi 

= M(t) 

Y t 

_ Zl _ 

(t) 

Zi 


(t) 


(4.46) 


4.3.2 The Dynamics of the Rotation of the Earth, The Liouville 
Equation 

The instantaneous rotation vector of the earth co with respect to the 
earth-fixed system is connected to the Eulerian angles through the geometric 
Euler's equations [MacMillan, 1960, p. 185] 




sin 0 sin cos 0 0 


<P 

Us 

= 

sin 0 cos ip -sintf) 0 


0 

a>3 


cos 0 0 1 


j. 


or 

— cte 

“ = S ( e)5T ’ fi ( e, 5 “ W> 6 *1' 
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(4.48) 


The rotational motion of the earth is governed by the Liouville 
equation [Munk and MacDonald, 1960, p. 9] 

— /c co ■+ h i + [ajA] i'C cU +h it ? = (4.49) 

dtl-(t) (t) . (t)f 5 (t) \-(t) (t) (t) I (t) 


where 


[tt)A] = 


0 —OO 3 COg 
<0 3 o -Wi 

-C0 2 00 1 0 


and 


C = 


A -F -E 

-F B -D 

-E -D C 


/ 


earth 


/ + z 2 
-xy 
-xz 


-xy 

x 2 +z 2 

-yz 


(4.50) 


-xz 
-yz 
x 2 +y 2 


dm 


(4.51) 


is the matrix of moments (A, B, C) and products (D, E, F) of inertia 
(matrix representation of the inertia tensor Cij). 


h = 


/ 


earth 


dm 


( r = [x y z] T ) 


(4.52) 


is the relative angular momentum vector,, and L is the vector of torques 
exerted on the earth. 

Replacing oo from Euler’s geometric equations into the Liouville 
equation, we obtain the following s econd-order nonlinear differential 
equation 

c S (S ,S ♦ (c S (5) ♦ £S W }* + * + [<W> A ] + 

+ h ^ - L (4.53) 


To find an analytical solution e to the above deterministic equation for known 
__ (w _ 

C(t), h(t), L(t) and initial conditions e(to), e(1o) appears to be an impossible 
task without the help of some simplifying approximations. 
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We can write Liouville’s equation symbolically as 

^C{e (t) } = L(t) . (4.54) 

where ^ stands for the relevant differential operator. In reality, the 
functions Cjt), h(t), L(t) are only partially known, and they should be rather 
modeled as stochastic functions C_ (t, co), h(t, co), L(t, co). Liouville's 
equation now becomes a random equation 

X ((0) [e(t) } = L(t, co) (4.55) 

where is a random operator. (See [Bharueha-Reid, 1972, p. 71] for 
relevant definition and discussion. ) The proper mathematical theory for the 
treatment of such a random equation is probabilistic functional analysis. 

The solution is a stochastic process e(t, to) whose distribution depends on 
the (possibly random) initial conditions ^(to), e(tb) and the distributions of the 
random functions C(t, CO), h(t, CO), L(t, CO). The introduction of probabilistic 
concepts can only increase the difficulties in solving the above equation. 

Alternatively, if we set y(t) = [ e(t) e(t)] T , we can rewrite the Liouville 
equation in the form 


- 

dt 


-(CS)' 1 {[CS + CS]e + h + [(Se)A] (C Se + h) + l| 


or shortly 

® = f (y(t), t) 


(4.56) 


(4.57) 


The stochastic analogue of this differential equation is 

= f (y(t, CO), t, CO) (4.58) 


Upon integration we obtain a random nonlinear integral oquation of the 
Volterra typo [Bharucha-Reid, 1972, p. 187]: 
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y(t. u) 


(4.59) 


t 

“ y(to, W) + j £( 7 ( 8 , W), s, CO) ds 
to 

Because of the complexity of the function f , both the stochastic differential 
equation and its integral counterpart fall outside the types extensively 
studied by mathematicians. We shall therefore have to introduce some 
linearization giving rise to a sufficiently accurate approximate equation of 
a simpler form. To do this we need to introduce a more convenient set of 
parameters than e, c describing the rotational state of the earth. 

r 

, 4.3.3 Alternative Parameterization of the Rotation of die Earth 

In view of Euler's geometric equations, an alternative set of state 
parameters is [ e T co T ] T . The rotation vector to can be transformed into the 
same vector O j with respect to the inertial frame, with the help of the 
transformation 

Oj(t) = R*($(t» RL(9(t))'lfeflp(t)) ®(t) = R(e(t))co(t) (4.60) 

We can now introduce an alternative parameterization which conforms to 
the traditional separation of the earth's rotation into three parts: 

(a) Variation of the direction of the vector Qj with respect to the inertial 
- frame (precession-nutation). 

(b) Variation of the direction of the vector co with respect to the earth- 
fixed frame (polar motion) . 

(c) Variation in the magnitude of the rotation vector, i.e. , variation in the 
angular velocity of the rotation of the earth (length-of-day variations). 

The matrix M(t) of transformation from the inertial to the earth -fixed system 
can be written in the form 
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This representation differs from the classical one [Mueller, 1969]: 

M(t) = Ba(-Xp) Hi(-yp) Rs(GAST) Ri(-€-Ac) Sa(-A0) Ri(0 

Rs(~z) Rb*(0) S 3 (“Co) ' <4.62) 

by the fact that no differentiation is made between precession and nutation, 
and the S 3 part in the precession-nutation transformation has been included 
into the initial epoch angle 0 o of the diurnal rotation. 

The earth-fixed system can be chosen to be sufficiently close to the 
instantaneous rotation axis (at least for quite a long time interval where 
secular polar motion causes no problem) so that 


, -1 Wi _ 

tan 

CO3 CO 3 £2 

(4.63) 

, -1 40 2 C0 2 40 2 

tan = 7— = 77 

Vw! +<»! “ 3 

where 

i 

a = || co [| = || Oj || = (to? + a)! + <4) 8 (4.64)- 

The same approximation is not valid for because of its variation with 
respect to the inertial frame . 

/v 

Introducing a new geocentric moving frame X, Y, Z connected to the 
inertial oneX, Y, Z through the precession -nutation theory transformation 




Rxt-e-Ae) K3(-A0) R x (e) Rg(-z) Re(6) Ra(-Co) 


X 

Y 

Z 


= N(t) P(t) 


X 

Y 

Z 


(4.65) 


the rotation vector Cl with respect to X, Y, Z is near the Z axis . With the 
help of the same approximations as for w, we obtain 


t 

M(t) = Q «> (®0 +_/"«(=) ds) Q' (§■, N(t) P(t) 

to 

(4.66) 


where for small angles p, q 


Q(p, q) = R 2 (-P) Ri(q) 


1 0 p 

0 1 q 

-P -q 1 


(4.67) 


4.3.4 The Linearized Liouville Equation 

Since the rotation vector o o is close to the z axis, it can be approxi- 
mated to the first order by a vector co 0 = [0 0 £2 r} t , where CIr is an 

approximate constant reference value for the angular velocity of rotation 
Cl (t). We can introduce a vector m of small quantities defined as 

+ Ci R m (t) (4.68) 

In a similar way, the inertia tensor (matrix) can be approximated by a 
constant matrix 


oo(t) = co 0 + 6co 


0 

0 

CIr 


no 



A 0 


0 


Co = 0 


(4.69) 


0 0 


and a small correction matrix c(t) can be defined as 


C(t) = Co + c(t) 


(4.70) 


Muhk and MacDonald [1960, p. 38, Section 6. 1] show that with the help of the 
above approximations and neglecting terms of products and squares of the 
small dimensionless quantities Cjj/C, mi, and h t /(0 R C), the Liouville equa- 
tion can be written in the following linearized form: 


m (t) = cr r 1 


0 or r <Pz (t) 

0 m(t) + -cr r <pi(t) = o r Pm(t) + f * (t) 


0 0 


<p 3 (t) 


(4.71) 


0 — A 2 IT 

where o r = — r~ Or is the Eulerian frequency (period — « 10 months) 
A O' r 


&T (Pz A*( Or 023 — Ci 3 + hs - Or hi + Or Ll) 

f — ~U r <pi = A 1 (-OrCi 3 — C23 - hi - Or h 2 + Or L 2) 

<p 3 C*( - 633 - Or* hg + Or I43) 

- Co {Or Pc - t + PTl - OrH + Or 1 L } 
with c — [C13 Cga C33 ] T . 


(4.72) 


Matrix notation is more convenient for computations, but we will als o 
use complex notation for its analytical advantages. In complex notation, the 
first two of the Liouville equations become 


. m <•*■> ~ 

1 — + m = <p 

CTr 


(4.73) 


m = mi +. imj 


<p = <p 1 + i«p 3 


(4.74) 



Setting f = f* + i f 2 , we have - — f * and 

Or 

na • = 1 CT r m + f 


(4.75) 


We shall call f * the total excitation function in discrepancy with Munk and 
MacDonald who call <p the excitation function. 

Of primary importance is the part of the excitation function f D , due to 
the rotational deformation of the earth. Rotational deformation gives rise to 
the following changes in the products of inertia: 


k k 

Cm = (C-A)~mi. = A^r a r r-nii 

Kf K f 

. ^-1 k 

C 33 — A Si r cr r ■ m 2 


(4.76) 


where k is one of the Love numbers and kj the corresponding "fluid" Love 
number [Munk and MacDonald, 1960, Chapter 5], Using these product of 
inertia changes, we obtain the excitation due to rotational deformation 

= - a r ~ Pm (4.77) 

Kf 


Setting f * = f D + f, where f is the remaining, or simply the excitation, 
function, we obtain 

. k — / k \ — 

m = cr r Pm - 0 r r — Pm + f = Cr r {l-“IPm + f (4.78) 

in = CToPm + f (4.79) 

2tt 

where cr 0 = cr r (1 - k/fef) is the Chandler frequency (period — » 14 months). 

(Tq 

In complex notation we have, after setting f - fi + i f 2 , 

m = i CT 0 m + f (4.80) 

The solution of the above equation with f ~ 0 gives rise to a circular motion 
of the rotation axis with frequency Ob (Chandler wobble). To explain the 


fo = 


k_ 

’•kr 


m 2 

-mi 

0 
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broadening of the spectral peak corresponding to the Chandler frequency, it 
has been suggested that the earth should be regarded as a damped linear 
oscillator tuned to the Chandler frequency and irregularly excited [Rochester, 
1970, p. 9]. 

To account for damping, a new linear term can be added to the 
linearized Liouville equation 


with 


m 


Ob Pm 


1 r> — 

— D m + 
T — 



D 


10 0 
0 1 0 
0 0 0 


(4.81) 


(4.82) 


where T is the relaxation time associated with the damping. In complex 
notation we can introduce the concept of a complex frequency Oc = (To + i/T 
to obtain [Smylie et al. , 1973, p. 395 


m = i a c m + f 


(4.83) 


4.3.5 Solution of the Linearized Liouville Equation 

We are concerned here with the solution of the linearized Liouville 
equation without damping 

ii(t) = cr 0 P m (t) + I(t) (4.84) 

The last of the above three equations has the obvious solution 


m 3 (t) 


t 

m 3 (to) + f (s) ds 


/ 

to 


(4.85) 


while the first two can be written in complex form 

m (t) = i cr 0 m (t) + f (t) (4 . 86) 

with general solution [Munk and MacDonald, 1960, p. 46, equation (6. 7.1)] 
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fn(t) = e iCTot m 0 + J* f( r ) dr 

-00 


(4.87) 


Setting t = t 0 and solving for m 0 we obtain 

to 

m 0 = e ' iCToto m(to) - / e ‘ ' f(T) dr 


/ 


( 4 . 88 ) 


-00 


Setting this value in the general solution, we have 

t 

m(t) = e 100 ^ to ^ m(t 0 ) + J e icr o(t-r) ^ dr 

to 


(4.89) 


Separating real and imaginary parts and combining with the solution for 
m 3 (t) into matrix notation, we finally have 


m(t) 


Bsl-Po (t -t 0 )] m (t 0 ) 



Bq [-ao(t-T)] f (T) d T 


(4.90) 


It is well known [McGarty, 1974, Section 2.2] that the solution to a linear 
differential equation 

^ = A(t) x(t) + y(t) (4.91) 


is of the form 


x(t) = ^(t, t 0 ) x(to) + / $(t, 7) y(T) dT 


/ 


to 


(4.92) 


where $ (t, s), called the state transition matrix, is the solution of the 
differential equation 
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(4.93) 


^ $<t, tb) = A(t) $(t, to) 

with initial condition § (to, to) = I, the identity matrix. 

In our case it is easy to show' that 

(t, s) = R 3 [-Ob (t - s)] (4.94). 

Differentiating with respect to t, we obtain . ■ 

~ $(t, to) = — R 3 [-C 0 (t- to)] = c 0 P R 3 [-Co(t-to)] (4.95) 

and also the initial condition is satisfied, since 

$(t, t) = R 3 [-c 0 (t — t)] = R 3 (0) = I (4.96) 

for any t. R 3 [-c 0 (t- s)] also satisfies the transition property 
$(tl» ts) ^(tg, t3) = R 3 [ -0r O (tl-t^g)] R 3 '[~Co (t 2 “t 3 )] = 

= R 3 [-Co (ti — tg+tjg— 1 q)] = R 3 [-Co (ti - to)] “ ^ (ti, ts) (4.97) 
and the property 

^(t, s) = Ra[-C 0 (t-s)] = R 3 [-C 0 (s-t)) = 

= $(s, t) .= $ T (t, s) (4.98) 

4.3.6 Stochastic Solutions of the Linearized Liouville Equation 

Because of the uncertainty about the exact form of the excitation 
function, it is not possible to obtain a solution for m(t) in terms of a finite 
number of parameters to be estimated from sufficient observational data. 
Instead, a stochastic model for f (t) is more appropriate to account for 
existing uncertainties. 

We shall assume that the excitation function is modeled as a 
second-order stochastic process f(t, to) with known mean value function 

Mf(t) = E{f(t, to)} (4.99) 
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and correlation matrix 


Rff (t, s) = E { f (t, to) f T (s, to)} ' (4.100) 

The solution is also a stochastic-process m (t, to)- with- mean value function 

t 

r 

iZm(t) = R 3 [-(1 o (t-to)] m(to) + / R 3 t-CTo(t-T)] JZf (T) dT 


7 

to 


(4.101) 


and correlation matrix R n m (t, s). To avoid complicated formulas we 
introduce 


z (t) = R 3 [-Oo (t - t 0 )l m (t 0 ) 

y(t, to) = m(t, co) - z (t) 


(4.102) 


with 


jiy(t) = E {y(t, co)} =J R 3 [-Oo(t-T)] jifOO dT 


(4.103) 


to 


We now have 

P m (t) = z(t) +£,<*> 


(4 . 104) 


and 


R^.ft, s) = z(t)z T (s) + /Lt y (t)z T (s) + 


+ Z (t) M y T (S) + Eyy 


(4.105) 


where 


Ryy(t, s) = J j 


Rs [Oo (4-t)] Rff (I, 0 Ra [Oo (s -£)] d£ dfc 

(4.106) 


?-to C-to 

The covariance matrix of m (t, co) can easily be found to be 


C 0 m(t, S) = Rwn(t, s) - /lm(t) (C» (S) = R yy (t, S) - ^y(t) M/(S) 

( 4 . 107 ) 
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We have assumed here that the initial value m(to) is deterministic, i.e., 
either completely known or completely unknown (a parameter to be determined 
from observations). If some a priori estimate of m (to) is available, it can be 
modeled as a vector random variable with mean Mo and correlation matrix 
Ro. Under the additional assumption that m (to, to) is uncorrelated to f(t, to), 
so that E {m(to, to) F(t, to)} = 0 for every t, we obtain 


V 

/ 


Mm(t) = R 3 [-Oo (t-to)] Mo + J Rs [-ffo(t-T)] fJt (T) dT = 

to 

- M*< t)+ji y (t) (4.108) 


Ran (t, s) - R 3 [-Oo (t-to)] Ro Ra [CTo (S -to)] + 
t s 


/ / 


+ I I R 3 [-ab(t-€)] Rff (£,£) Rs[cro(s-C)3 d£ d£ 

C=to C=to ( 4 . 109 ) 


and 


Cna(t, S) = Rob ( t, S) - Mn(t) Mb (S) 


(4.110) 


From the mean and correlation of m(t, to), the corresponding mean and 
correlation of the rotation vector to (t, to ) can be found with the help of the 
the transformation co = cU 0 + O R m 


Mo>(t) = 


0 

0 

Or i 


+ Or |i, (t) 


and 


B«®(t, s) - Of 


0 

0 


0 

0 


M (t) U (t) 
m i m 3 


(4.111) 


M (s) 

mi 

M (s) 
m 3 

1 + f* < S >, 

m 3 m 3 j 


Rnm(t, S) 


(4.112) 
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We have parameterized the rotation, of the earth in terms of 0O1/G, 

O, O-i/H, 0,^/0 , . These parameters can be sufficiently approximated by 
C0i/Q R = mi, w 2 /O r = m2, CO3 — £2 R (1 + 103), CI 2 /&R respectively. 

Setting Mi = Ui/£2 R , M 2 = Q.s/0, Ry the rotation ti a n s formation becomes 

* t 

M(t) = Q(mi, m 3 ) R3 [©0 + (t-to) + £2 R J m3(r)dr] 

to 

Q t (Mi, Mg) N(t) P(t) (4.113) 


In addition to mi, m 2 , m3, we also need the statistics of Mi and M^ As an 
intermediate step we need to obtain the mean and covariance function of the 
rotation vector with respect to the inertial frame from the mean and 
covariance of the vector Zd'. Unfortunately, this requires the solution of 
the nonlinear stochastic differential equation 


d e (t) 
dt 


F(e, co) = S 1 ( e ) co 


(4.114) 


Even approximate solutions involve computationally tedious numerical 
integrations . We can use die mean value function /Ju> of cd to obtain 
through numerical integration an approximate solution e 0 of the deterministic 
equation 


d ep 
dt 


S ' 1 ( e 0 ) 


(4.115) 


A linearization is now possible through Taylor series expansion about 
e 0 , and neglect of second- and higher -order terms. Introducing 
6e = e - e'o, 6CJ = co - flu,, we obtain 


d 6 e (t) 
dt 


A(t) 6 e (t) + B(t) 6 co (t) 


(4.116) 


where 
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A(t) 


d e 
d e 


e 0 , Mm 


_ 5[S" l (e 0 )] — 

- — rr 4&> 

O Co 


(4.117) 


and 


^ '= U 


'• = s: x (eo) 


(4.118) 


eo 


Numerical integration of 

^ $(t, to) = A(t) $(t, to) 


(4.119) 


with 


$(to, to) - I (4.120) 

yields the state transition matrix (t, s), which can be used to find the 
mean Ji and correlation K(t, s) of 6e(t) 

t 

. . (4.121) 


.M(t) = ^(t.to)M 6 ^(to). + J *(t, 

to 


T) B (t).M 6aJ (T) dr 


K(t, s) = $(t, t 0 ) E{6e(to) 6e T (t 0 )} $ T (s, to) + 
t s 

* 7 / $(t, i) B(4) e (6co <£) 6aJ T (C)} B t (0 $ T (s,0 d£ dC 


4~to C~to 
A similar linearization of the equation 

— R(e ) W = Ra($) Ri(0) R a (<p) (0 


(4.122) 


(4.123) 


leads to approximate statistics for Oj with the use of the known mean and 
covariance functions of e and to , and finally to those of 0 and Mi, Ms. 

Such a laborious propagation might be of some value despite the approxi- 
mate linearizations involved if the true mean and covariance function of 
the excitation function were known. However, this is not the case at present. 


To obtain such ’’true" statistics one should identify all physical processes 
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contributing to the total excitation function, and also estimate their mean 
and covariance functions. The present uncertainties about mechanisms 
with effects on the earth’s rotation [Rochester, 1973] limit the feasibility 
of such an approach . The alternative is to use a simple model covariance 
function for the excitation function, which is computationally tractable and 
hopefully approximates the true one. In this case the propagation from the 
statistics of mi, m 2 , m 3 to those of Mi, Ms, simply offers model consistency • 
which might be questionable in view of the approximations involved in the 
linearizations. An independent simple covariance model for Mi, M 2 may 
serve our purpose and also considerably reduce the computational effort 
involved. 

Some more or less obvious properties that one might postulate on 
the excitation function are zero mean and stationarity of the covariance 
function . 

A zero mean function gives rise to a circular motion of the pole 
with the Chandler frequency, corresponding to the "expected" behavior of 
the polar motion. Other irregularities superimposed on the Chandler wobble 
are accounted for by the fluctuations of the actual excitation function about 
its mean value function. However, the linearized Liouville equations holds 
for an.arbitrary reference frame, while changes in reference frame cause 
changes in the excitation function. With this in mind, the zero mean condition 
on the excitation function must obviously hold for only one particular 
reference frame. Since 

_ . _ 1 — 1 

Co f = fi«Pc + c + Ph-^h+^L (4.124) 

and, assuming, 

‘ E{t} = E { h } = E it] = EIl} - O' (4.125) 

the condition E { f } - implies 

> , 1 * x 

■ E { c 3 - -0 , i.c. , Ifiteis},* 0, 1 = 1, 2, 3 (4.126) 
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For this to be true it is necessary that the direction of the z-axis in the 
earth-fixed frame coincide with the ’'expected" direction of the correspond- 
ing principal axis of inertia z- If such an estimate is not available, the 
> introduction of two additional parameters of rotation from the Zj to the 
z-axis 


X' = R s flu) Ri(-X) Xj 



(4.127) 


leads to a new excitation function f (the old one being fj with E £fj 3 - 0 ) 
with 


E C f } 



(4.128) 


The frequency Co in place of the original one a r = A x (C-A) accounts for 
the compensation of the difference in the rotational deformation part of the 
total excitation function. Use of the above mean value function results in 
the following mean value of the solution 


E { m (t) } = 6 + R 3 [-Ob(t-to)] [E [m(to)3 - 6] (4.129) 


where 

6 = [ft 1 0] T (4.130) 

It can be seen by plotting the solution on the xy -plane (see Figure 4 . 1) that 
the difference is only in a translation corresponding to the change of direc- 
tion of the z-axis . 

The mean value function, except from the above effect due to the 
deviation of the z-axis from the principal axis of inertia, may also include 
periodic terms already identified from past experience in polar motion 
data analysis (annual, semiannual terms, etc.). 
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Figure 4.1 Effect of the Change of Origin in the Liouville Equation 
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The stationarity assumption of the covariance function corresponds 
to the concept of regularity in the physical processes giving rise to earth 
rotation, at least for periods of time which are short on a geological time 
scale. This regularity is essential in introducing the concept of ergodicity, 
which legitimizes the very use of probabilistic models. 


4.3.7 Solution of the Linearized Liouville Equation with Damping 

To include the effect of damping, a linear term is added to the 
linearized Liouville equation 


with 


M(t) 


cr 0 P m (t) 


1 

T 


D m (t) + f (t) 


(4.131) 



10 0 
0 1 0 
0 0 0 


(4.132) 


The only difference appears in the first two- equations which in complex 
notation read 

m(t) = iffcmitjT^t) (4.133) 


with 


cr c 


a o + “ 


(4.134) 


The general deterministic solution is 



t 


icr c t 

e 



f(i) e 


-ia c £ 



(4.135) 


Setting t = t 0 , solving for m 0 , and replacing it in the above equation, we 
finally obtain 
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m(t> = e *%(‘-to) ffi(to) + y e *®>(‘-e r ({) d5 

to^ 


- e tffo(t-‘o) ffi(to) + l\ 10-0 (t-5) 


Z * 1 


to 

e ‘( t_ ^)/ T ?($)d$ (4.136) 

Combining with the unchanged solution of m 3 (t) in matrix notation we have. 

t 

m(t) = R s I-a 0 (t-tb)] Q(t-to) m(to)+ J Rs[-0b(t-5)] Q(t-£) T(£) 


to 


(4.137) 


where 


Q(a) = 


"e -a/T 0 o' 

„ -a/r 

0 e .0 


0 


0 


(4.138) 


is the "damping" matrix with the property 

Q(a) R 3 (0) = R 3 (©)Q(a) (4.139) 

The corresponding state transition matrix obviously is 

$(t, s) = R 3 [— CTo (t- s)] Q(t-s) (4.140) 

In the absence of any excitation, the corresponding free motion 
(on the xy-plane) is a contracting spiral whose center is the inertia 
symmetry axis [Rudnick, 1956, p. 137]. Since the Chandler wobble is known 
to be maintained, the mean value function of the excitation must be such that 


(t) = E { m (t) ) = e 


CZ /+\ 1 - Q iOb(t-to)~ 


(4.141) 


This implies that 
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i cr 0 e 


i <7 0 (t-t 0 ) 


a 


i cr c Mb + 


and, therefore. 


- E{T(t)} 


A e i o- 0 (t-t 0 )~ 

T 


(4.142) 


(4.143) 


The above condition must hold for proposed models of the excitation function. 

The damping-excitation hypothesis, put forward as an explanation 
of the broadening of the spectral peak corresponding to the Chandler fre- 
quency, poses to this day two outstanding problems in polar motion analysis. 
The first problem is the nature of the damping or dissipation mechanism, 
manifested in our equations by the uncertainty in the value of the decay 
time T . The second problem, the nature of the excitation of the Chandler 
wobble, has been the matter of much controversy, mostly centered about 
ihe role of seismic activity and initiated by the work of Mansinha and Smylie 
[1967]. A detailed account of the problem can be found in [Dahlen, 1971] . 

In general, the excitation function may be considered to consist of 
two parts 

f = fc + fp (4.144) 

a continuous- one fc and a discontinuous one fpt the latter associated with 
abrupt changes of the earth's inertia tensor caused by earthquakes. 

The discontinuous excitation function is of the form 

f P = Co (OrPc - t } 

- _ r -»T (4.145) 

C - [CJ3 C23 C 33 ] 


The first two components in complex notation are 

% = fpx + fp 2 i - AMiOrC+c} (4 146) 

C = Cl3 + i C23 

If Acj (Acij) denotes the change in c caused by the earthquake after some 
initial epoch t 0 , occurring at epoch tj, we have [Mansinha and Smylie, 1967, 


) 
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p. 4733; Dahlen, 1971, p. 162] 


where 


c(t) = X Acj u(t-tj) 

J 

£(t) = X Acj 6(t-tj) 

j 


(4.147) 


U(t-tj) 


0 t < tj 

is the Heaviside step function 

,1 f* tj 


6 (t-tj) 


/ 0 t ^ t, 

| . is the Dirac delta, function 

(l t= tj 


The excitation function becomes 

fp<£) = ~ L J 3 ~^2 A * j u ^"^> “ A X A ° j (4.148) 

J j 

Inserting?^ into the solution of the Liouville equation with damping, we 
arrive after formal integration to 


~ i Oc (t — to) ~ ^ \ A A ~ (^ "t - Ctc) ' A ~ Oc (t “ tj) 

m(t) = e m(to) + ^ / A Cj - A y A c 3 e 

J ' J (4.149) 

Separating real from imaginary parts and combining with the solution of 
the third component 


m 3 = m 3 (to) + / fa (£) d£ = 


/ 


to 


= ni3(to) + 


lr 

f[-cT, Ae -> 6 «-«] d « = 


■m3(t 0 ) 


- h E ^ 
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(4.150) 



Ini matrix notation 


m 


(t) = Ra[-0r o (t-to)] Q(t-to) m(to) + B A cj + 

J 

+ ^ ' G R 3 [~C7o (t-tj)] Q(t-tj) Acj (4.151) 


where 


G = 


A(0q +T" 2 ) 


ft a 0 + Co + r -2 
-ft/r 
0 


L 


ft 


B - 2 . 2 

— a 0 + t 


a 0 /A i/ta 

-l/TA (To /A 

0 0 


. ft/r 

£3 Ob + Oo + t“ 2 
0 


0 

0 

-1/C 


J 


(4.152) 


(4.153) 


(Note that all matrices Q, B, G, Ra(0) commute.) 


- 3 

The above equation is a solution to the deterministic problem when the 
occurrence epochs and the effect of the earthquakes on the inertia tensor 
of the earth are known. It can also be viewed as the solution correspond- 
ing to a sample function in the stochastic case where uncertainty is present 
in both the occurrence epochs tj and the inertia tensor changes Acj . A 
stochastic model can be constructed with the help of two stochastic processes, 
the (homogenous) Poisson count process and the filtered Poisson process. 

• Refer to [Parzen, 1962, Chapter 4; Snyder, 1975, Chapters 2 and 4) for the 
relevant rigorous definitions . 

The Poisson count process N(t) refers to the number of earthquakes 
occurring in the interval [t 0 , t] (N(to) = 0 w.p.l), and it is characterized 
by the probability 


-X(t-s) .m m. 
e ' X (t-s) 

ml 


t > s > t 0 

(4.154) 


PtN (t)- N (S )- m! 
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where X > 0 is called the intensity (or mean rate) of the- process, in view of 
E { N(t) } = E {.N(t) - N(tb) 3 = X(t-to) (4.155) 

A filtered Poisson process is a stochastic process of the form 


N(t) 

xw - g (t, tj, %) 

)=i 


(4.156) 


where Yj is a sequence of independent random vectors identically distrib- 
uted with a random vector Y. 

The part of the damping-excitation solution m (t) of the Liouville 


equation, independent of the initial state m (to) 

N (t ) 


X(t) 



B + GRa -Ob(t-tj) Q(t-tj)} Aej 




(4.157) 


is a filtered Poisson process with 

g(t, tj, Ac,) = (B + GRa - CTo(t-tj) Q(t~tj)} Acj (4.158) 


provided that earthquakes occur with a probability distribution such as 
described by the Poisson count process and the corresponding inertia 
tensor changes Acj are identically distributed. 

If E { Acj } - E {Ac } and E {A cj Acj T } = E {Ac Ac 7 } are known , 
the mean value and correlation of X (t) can be found with the help of 


E 


{X(t)3 = ' xj E{g(t , t, Ac) .d T 
to 


(4.159) 


and 


min(t, s) 


E{X(t)X T (s)3 * X J E{g(t, r, Ac) g T (s, T, Ac)} dT 


to 


(4.160) 
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A usual simplification is to replace the epochs tj as described by N(t), with 
a sequence of equidistant epochs, such that 

t J+ i - tj = At = X' 1 (4.161) 

in which case the filtered Poisson process is replaced by a random walk 
in R 3 [Mansinha and Smylie, 1967, p. 4739] . 
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5. ADAPTIVE ESTIMATION 


5.1 Introductory Remarks 

Perhaps the most intriguing problem in least squares estimation, 
is the determination of weights. Both least squares adjustment and pre- 
diction are based upon the a priori knowledge of means and covariances 
of random parameters and stochastic processes (signals). The presence 
of functions with uncertainties significant compared to the level of observa- 
tional noise, necessitates the inclusion of their effect in modeling. One way 
to circumvent this problem is to include only the values of the function at 
observation points (epochs) as unknowns and to secure enough observations 
to estimate all the unknowns. This is actually done in geometric methods 
of satellite geodesy, where simultaneity of observations increases the 
number of observations without a similar increase in unknowns (satellite 
positions). However, this is not always possible and treating each function 
value as an independent unknown generally results in overparameterization. 

One way to avoid this problem is to represent the unknown function(s) 
in terms of a finite number of parameters, using polynomials, trigonometric 
series, step functions, etc. Although with a sufficient number of terms it is 
possible to approximate a wide class of functions arbitrarily well with the 
help of such representations, the fact that the function to be approximated 
is unknown poses some serious problems. One has to determine the number 
of terms to be included; and even then some of the coefficients of the term s 
included might be very poorly determined from the available observations, 
thus giving rise to singularities and computational problems. 



The alternative approach is to model unknown functions as stochastic 
processes such that their' probability distributions determine the "likely " 
behaviour of the unknown functions. When the functions to be modeled are 
not directly observable, statistical sampling techniques for obtaining 
estimates of their means and covariances are not applicable. In this case, 
proper mean and covariance models have to be determined from the only 
available source of information, the observations themselves. We say in • 
this case that the stochastic model is "adapted" to the observational data 
and the resulting adaptive estimation techniques are the subject of this 
chapter. 

5.2. Review of the Continuous -Discrete Kalman -Buev Filter 

The problem in question is the estimation of the state vector X(t) 
of a dynamical. system whose evolution in time is described by the (contin- 
uous) linear differential equation 

^ X(t) = A(t) X(t) + G(t) u(.t, «) (5.1) 

from a finite number of observations (discrete observations) 

y ( 1 1 ) = Hi X(ti) + n(t t ,< 0 ) i=l, 2, . . , m (5.2) 

A(t), G(t), H j are known matrices, the forcing term u(t, to) is a given 
vector stochastic process, and n, ( co) ~ n(tj, co) is a given sequence of 
random vectors. 

The term filtering refers to the estimation of X(t B ) and the term 
smoothing to the estimation of X(t t ), i=l,2, . . ,m-l, from all the obser- 
vations. The more general term prediction refers to the estimation of 
X(t) for any epoch t, not necessarily coinciding with any of the observation 
epochs t j . 
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•If the random variables -in X(t 4 , (o), n t ( to) are second -order with 
known means and covariances, the filtering -.smoothing pr.oblem can be 
solved with the help of classical least squares adjustment techniques 
(condition equations), and with a minimum variance criterion for estimate 
optimality. 

If a set of completely unknown parameters is involved, the general- 
ized least squares method is to be used. 

The prediction problem can be separately solved afterwards with the 
help of the estimates of X(t t ) obtained, and the mean and covariance function 
of X(t), as a minimum variance least squares prediction problem. 

The problem has therefore been reduced to one of obtaining first and 
second order statistics forX(t, oo) from those of u(t, co) and' the initial 
value X(to , CO ) (in case it is modeled as a vector of random variables and 
not as a vector of unknown parameters). 

Usual assumptions are that u(t), X(t 0 ), n j are mutually independent 
and that n x is a sequence of Gaussian independent random vectors. 

If u(t) is a known stochastic process, so is G(t) u(t) and we can 
drop the coefficient matrix G(t) without any loss of generality. The state 
differential equation can now be formally integrated with the help of the state 
transition matrix 3>(t, s ), to obtain 


t 

X(t, CO) = $(t,to) X(t 0 ,w) + J $(t, i) u(4, co) d£ 


( 5 . 3 ) 


Since u(t, to) is a stochastic process the integral 


t 

y(t, oo) = j $(t, 


€) u(£,«) d£ 


( 5 . 4 ) 


■ t, 
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cannot be trivially defined as a Riemann integral in the general case. 

The most trivial case appears when u(t, co) belongs to the restricted class 
of processes with Riemann integrable sample functions. We may relax the 
integrability condition for sample functions u 0) (t) with w£ACfl, when 
P (A ) = 0. In this case we say that almost all sample functions are Riemann 
integrable, or that u{t, co) is Riemann integrable w.p.l (with probability . 
one). A more wide class of stochastic processes may be obtained by 
requiring that the above integral exists in the mean square sense (see 
[Jazwinski, 1970, p. 66] for definition). Mean square Riemann integrability 
of u(t, co) is equivalent to Riemann integrability of its mean, correlation 
and covariance functions [Jazwinski, 1970, theorem 3.7 and corollary 1, 
pp. 66 & 67]. In this case the propagation from first and second order 
statistics of u(t, co) to those of y(t, co) can be carried out with the help of 
the (deterministic- ordinary) Riemann integrals. 

t 

E [y(t, co) } = J $(t,£) E { u(£, co) } d£ (5.5) 

*0 


E{y(t,co) y T (s,co)} 


t - S 


■// 


$(t,£) E{u(£,co) u t (£,co)} $ t (s,£) d£ dC 

(5.6) 


A stochastic process attracting considerable attention because of its 
applicability to engineering problems is the Gaussian white noise process, 
with zero mean, and covariance matrix 

E { u(t, co) u T (s, co) } = Q(t) 6(t-s) (5.7) 

where 6 (t - s ) is the Dirac delta function. The white noise process has 
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everywhere discontinuous sample functions and fails to be Riemann mean 
square integrable. There is no shortcoming on the part of mathematical 
theory here, for white noise is not a physically realizable process, but only 
an idealization introduced mostly for the simplifying integration properties 
of the Dirac delta function. It is possible to formally represent u(t, co) as 
the derivative of the Wiener process W (t, o>) [MeGarty, 1974, p. 80] 

’ u(t, w) = ~ W (t, w) (5.8) 

In view of this relation' one might attempt to- define an integral of the form 

b 

j .G(t) u(t,.co) dt 

a 

as a Stieltjes integral with the help of the Wiener process 

b 

J G (t) dW(t,oo) 

a 

This does not solve the problem because although almost all of the sample 
functions W° > (t) are uniformly continuous, they are not of bounded' varia- 
tion [MeGarty, 1974, Section 3.3]. Among various definitions of the 
above integral, the one more widely used is that due to K. Ito. We shall 
refer to [ Jazwinski, 1970, Chapter 4] for details, restricting ourselves to 
the formal rule for obtaining the covariance of the process 

t 

y(t,o>) = j $<t,£) .dW(£,to) : (5.9) 

to 
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E {y(t, co) y T '(s, co) } • = 


min(t, s)- 

*(t,£) Q(|) $ T (s>$) d$ (5.10) 

This rule agrees with the definition of y(t, co) as an Ito stochastic integral 
and may also be formally derived from the covariance Q(t) 6(t-s ) of the 

' * * v i 

corresponding white noise process using the integration properties of the 
Dirac delta function. 

We have succeeded in writing the solution of the state differential 
equation for X(t) driven by white noise, in the form 

X(t) = 3>(t, t 0 ) X(t 0 ) + y.(t) (5.11) 

From the known first and second order statistics of X(t 0 ) and y{t), those; 
of X(t) can be easily derived and used to solve the filtering - smoothing and 
prediction problem, either globally by standard least squares adjustment 
and prediction techniques or, most often, by means of a sequential refor- 
mulation of the solution algorithm. The solution is similar when u{ t, <0 ) is 
a process with Riemann integrable sample functions or simply a Riemann 
mean square integrable process . 

5.3 Modeling of Stochastic Processes 

In real life problems, when the state of a dynamical system (such as 
the orbit of an artificial satellite or the rotation of the earth) is generated 
by a differential equation, the forcing term u(t) is in most cases an unknown 
function. The modeling of u( t ) as a stochastic process u(t, oo) presuppo- 
ses that although u(t) is not precisely known, it is not completely unknown 
either. This situation of relative uncertainty about u(t) is manifested in 
the probability distribution of the stochastic process model u(t,co) which 
roughly tells us what u(t) is "likely" to be, by means of a description of 
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its "average" .behaviour. 

Since we are not concerned with an ensemble of functions [u j(t) } 
but rather with a unique function u(t), the averaging has to take place over 
different parts of the function u(t) itself. This presupposes that the function 
possesses some regularity properties, such that different pieces (i. e. , the 
function oyer different time intervals) can be- considered as samples with the 
same "statistical", behaviour, and. their comparison yields the "average" 
behaviour of the function. These intuitive concepts are expressed within the 
probability theory framework by the ergodicity property of. the stochastic 
process .u(t,co) serving as a model for the real (and in a sense determin- 
istic) function u(,t). Ergodicity presupposes stationarity, and before such 
' a more or less restrictive as sumption is imposed, one .should make certain 
that the process is irreducible, in the sense that it cannot be expressed as 
a transformation of some other original process. This simply means that . 
a stationary - ergodic model should be used for unmodeled accelerations and 
the excitation function, rather than the orbit of the satellite or the rotation 
of the earth. 

If the function u(t) can be directly observed, the problem is a statis- 
tical one of determining distributions (means and covariances are sufficient 
for our purpose) from samples. However, we are primarily concerned here 
with the case when the function u(t) cannot be directly observed, or it is 
practically impossible to do so, a situation common in geodetic work. 

In this case one can only construct an empirical stochastic model with the 
help of available mathematical tools. Before constructing such a model 
one must first wonder whether the necessary tools are available in’ the first 
place. To ask the question in a different way, suppose that a stochastic 
process u(t,m)‘ exists which is an appropriate model for the unknown 
function. What are our chances' of empirically arriving at such a model ? 

, . i i 

Indeed, the mathematical literature is almost exclusively devoted to 
the study of a number of elementary processes (white noise, Wiener, 
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•Poisson) and processes derived from transformations of such processes. ; 
The answer is to be found in connection with the inverse problem, already 
studied by mathematicians: Given an arbitrary stochastic process, is it 

possible to express it in terms of some simple elementary processes ? < 

This is a problem of representation that has been given considerable 
attention. A collection .of papers on this subject can be found in 
[Ephremides and Thomas, 1973]. t 

For a wide class of stochastic processes called purely nondetermin- 
istic or linearly regural (see [Cramer, 1964, p. 170; or Cramer, 1971, 
p. 7] for definition) it is possible to obtain a representation (Cramer-Hida 
canonical representation) of the form 


X(t,co) 


t 

j g n (t,s) dz n (s,w) 

n “1 -'Co 



(5.12) 


where z n (s,co) are N mutually independent stochastic processes with 
orthogonal increments, hi simple engineering terms, X(t)can be consi- 
dered as the sum of the outputs of N linear filters with inputs white noise 
dz n 

processes . The smallest number N for which such a representation 

dt 

exists is called the (spectral) multiplicity of the process X(t). The class 
of second -order stationary processes we are concerned with here has 
spectral multiplicity one [Cramer, 1965, p. 218.]. This ascertains the 
possibility of constructing a wide class of stationary models with the help 
of the white noise process. 

Another aspect of modeling is the necessity of parameterization of the 
mean and covariance functions of the empirical model. The optimal values 
of such parameters must be determined from the observations available, 
since the non-observability of the function modeled excludes the use of 
sampling techniques . 

Empirical models are much in use in time series analysis techniques. 
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when a function u(t) is observed at equidistant epochs 1 1 (t t -t = At for 
all i). 3f u(t) is modeled as a stochastic process u(t,w), the sequence of 
random variables u 1 (co)=u(t 1 ,w) is a new discrete parameter- stochastic 
process. Three usually used models are constructed with the help of a 
discrete parameter white noise process, i.e. , a sequence of independent 
identically distributed random variables n t with 

E {n j } = 0 and E {n t n 3 } = a 2 6i a 

These are [Koopmans, 1974, Chapter 7] the moving average model: 


U ft ~ ^2 a J n *-J < 5 * 13 ) 

J=-B 


the autoregressive model: 



E-Ju 1 n lc } = 0 for 1 <k, 
(5. 14) 


and the mixed autoregressive - moving average model: 

Q 

X ^ Uic— j 

3=o 

These are finite parameter models and the first two can be extended to 
infinite by letting n, m, q-> «> and introducing the conditions ■ 

I _ +co 

X a * < " ’ 

— CO 1 


E 


1 = 0 


Cl n k _ t 


(usually d o = c o = 1) . (5.15) 
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Of particular importance is the one - sided moving average model 


U k = ^ a j n fc-j ( 5 * 16 > 


which possesses the properly of physical casuality since the state u k of the 
process at epoch t k depends only on past values n k _j of the exciting white 
noise process, and not on future ones. It can be shown that the solution to 
a finite autoregressive model is an one - sided (infinite) moving average 
[ Koopmans , 1974 , Section 7.31. 

A purely nondeterministic second - order stationary stochastic pro- 
cess X(t) satisfying certain conditions (see [Cramer, 1965, p. 219] for 
details) has multiplicity one, and canonical representation of the form 


X(t) 



g(t-s) dz(s) 


(5.17) 


An approximation of the above integral by a summation leads to 

k k 

X k = X(t k ) = £ gd^-t,) [ z(t 1 )-z(t 1 _ 1 )l = £ g k -, Ql 

i=-o> !=-«. (5.18) 

We have set z(t 1 )-z(t l _ 1 ) = n lt with E { n t } = 0 and Efnj n 3 }*= o -3 6j a , 
in view of E { dz } = 0, E {] dz(t)| 2 } = dt (z^J-zft^^dz) and the fact that 
z(t) has orthogonal increments. A simple change of the summation index 
from i to j ~ k— i gives 


X v = ^ n *-J < S ' 19 > 

J “ o 
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which is an one - sided moving average model. 

Without any approximation it can be shown that a discrete parameter 
zero mean weakly stationary stochastic process can be expressed as the sum 
of a purely deterministic and a purely nondeterministic zero mean processes 
(Wold decomposition), the latter having an one-sided (infinite) moving 
average representation [ Koopmans, 1974, p. 255]. This connection with 
the Cramer - Hida representation establishes the importance of the one - 
-sided moving average and the autoregressive' scheme (whose solution is an 
one - sided moving average) in modeling stationary stochastic processes. 

Passing from the discrete to the continuous time case and restric- 
ting ourselves to the autoregressive model in view of its finite number of 
parameters, we have 


d°u(t) 

dt“ 


n — 1 





g 


dW(t) 

dt 


(5.20) 


I . 

where W (t) is the Wiener process. Generalizing to a vector process u(t) 
and allowing the coefficient matrices to be functions of time we obtain the 
n tt order autoregressive model 


d n u(t) d n-1 u(tt du(t) 

+ B i (t) + . . . + B n _!(t) + B n (t) u(t) = 

dt“ dt”" 1 dt 

d W (t) 

= G (t) (5.21) 

dt - 


The solution u(t) of the above differential equation has multiplicity one and 
is a Gaussian Markov process [ Ephremides and Thomas, 1973, p. 13]. 

We shall limit ourselves to the first order (n=l) autoregressive process, and 
we shall investigate conditions for u(t) to be stationai'y. 
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The use of the Wiener process instead of any other process with 
independent increments restricts us to Gaussian processes. The 
"Gaussianity" assumption is essential in making possible statistical 
inference on final Gaussian estimates, since means and covariances 
completely specify in that case the corresponding probability distributions. 

Another important independent increment process is the Poisson 
(count) process and the generalized Poisson process [McGarfy, 1974, 
pp. 83-86]. The formal derivative of the generalized Poisson process is 
also a white noise process, although non-Gaussian. For the study of 
Poisson driven Markov processes we refer to [ Snyder, 1975, Section 4.2]. 


5.4 Construction of Exponentially Correlated Stationary Stochastic 
Processes from White Noise 

The most simple continuous time autoregressive model is the first 
-order one-dimensional one with constant coefficients. 


dx(t) 

+ p x(t) 

dt 


dW(t) 

dt 


E{ W(t) W(s) ) = a a 6(t-s ) 


(5.22) 


dW 

where W = . This is a linear stochastic differential equation 

dt 

with state transition function 



$(t,s) = 

-p(t-s) 

0 

(5.23) 

and solution 

x(t) = 

e “ P( t-t o) + 

t 

f e ~ p(t-£) dW(S) 

J 

(5.24) 



to 



We can differentiate between two types of solution according to whether 
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x (t 0 ) is a deterministic constant or a random variable: 

A. For x(t Q ) ~ c 0 ~ const. , we have 

■» * 'i * , 

m(t) = ' E { x (t) } = e to * c 0 

R ( t, s ) = "E { x( t ) x( s ) }, - ' e P ^ to ^ Cq e P ^ S t( ^ + 


,/ 


min(t, s) 


e - P( t-4) e -p(a-6) dC 


-[•••- 5 =] 


2 ■» e ~p{t+s) e 2pt 0 + £ e ' p l t_S 

2 P 


(5.26) 


The covariance function of x(t) is 


C(t,s) = R(t,s)-m(t) m(s) = 
of -p(t+s) 2 p t Q 

2p 

and therefore x(t) is nonstationary. 

B. For x(t 0 ) = x o (<0), with E{x 0 } = < 
E{ x 0 W(t) } = 0, we have 


m(t) = 

-p(t-to) 

e 

c O ‘ 


(5.28) 

R(t,s) = 

[•!-r P ] 

e ~p(t+s) e 2pt 0 + 0_ 2 e - 

2 P 

p|t-s! 

(5. 29) 

C(t,s) = 

r a (T 2 

r 0- T P - 

c 2 0 l e' P(t+S) e 2pt ° h -2? 
J 2 p 

e -p|-t-s 

1 • 


(5.30) 

The solution x(t) is in general a nonstationary process. It is possible to 


+ £ e “P |t-s | 
2p 


(5.27) 


! 0 , E[ Xq } = CT? and 
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obtain a stationary process by setting c 0 = 0 (so that E { x(t) } = 0 = const. ) 
2 

and ao = a 3 /2 p so that 


C(t,s) 


_Z! e - p 1 1 - s J 

2 P 


(5.31) 


We have succeeded in constructing a stationary stochastic process 
from white noise. With appropriate selection of the parameters c 3 , p we 
can obtain every such process from the class of exponentially correlated 
processes with covariance functions of the form 


C( t) = C 0 exp[- | T | / T] , r=l-s, T > 0 (5.32) 

where C 0 = C(0 ) is the variance of x* (u>) for every t, and T is the 
"correlation time" related to the "sharpness" of C(r ). The smaller th.e 
correlation time T is, the more the covariance function is concentrated 
about the C( t) axis. 

To extend these results to more dimensions consider the vector 
stochastic process X(t) generated by a general first order autoregressive 
model of the form 

— X(t) = A(.t) X(t) + ~W(t) (5.33) 

dt dt 


with E { W(t) W T (s) ] = Q(t) fi(t-s), E{X(t 0 )} — X 0 , 


E [X(to) X T (t 0 ) } = R 0 , 
The solution is 


E[W(t) X T (t 0 )} = 0 


/ 


to 


X(t) 


$(t,to) X(t 0 ) + 


$(t,4) dW(^) 


(5.34) 



m(t) 


E{X(t)} =. $(t,t 0 ) X 0 


(5.35) 


and covariance function 


min(t, s) 

C(t,s) = $(t,t 0 ) [ R 0 - X 0 Xj ] $ T (s,to) + / $(t,£) Q(0 &(s,Q d£ 

t (5-36) 

The necessary and sufficient conditions for stationariiy are m(t) = const, 
(implying X 0 = 0), and C(t, s) = C(t-s). It can be shown (see for example 
[Bucy and Joseph, 1968, p. 25] ) that K(t) = C(t,t) is the solution of the 
differential equation 

A 

— K(t ) = A(t) K(t) + K(t) A T (t) + 'Q(t) (5.37) 

dt 

with initial condition K(t 0 ) = C 0 - R 0 ~ X 0 Xj 
3h the stationary case K(t) = K(0) = K(t 0 ) = R 0 is constant and therefore 

— R 0 = 0 = A(t) R 0 + R 0 A T (t) + Q(t) (5.38) 

dt 

This is a necessary but not sufficient condition for stationarity. We can 
obtain sufficient conditions (without claims to necessity) by setting 

A(t) = A ~ const., Q(t) = Q = const., X 0 = 0, and 

A R 0 + R 0 A T + Q = 0 (5.39) 

This is equivalent to the condition 2 p = u' in the one-dimensional 

case. An additional condition corresponding to p > 0 is that the eigen- 
values of A must have negative real parts (see [Arnold, 1974, p. 133] ). 

If X(t 0 ) is also multivariate Gaussian, X(t) is a Gaussian process. 
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5.5 The Dynamic Model Compensation (PMC) Algorithm 

> Ingram and Tapley [ 1974 ] have deviced a method for the estimation 
of the state of a spacecraft in the presence of unmodeled accelerations which 
they call the "Dynamic Model Compensation" (DMC) algorithm. With the 
help of material from [Ingram, 1970, Tapley, 1973; and Ingram and Tapley, 
1974] we shall give a somewhat generalized version of this technique. One 
part of our generalization is the consideration of an unspecified dynamical 
system with state governed by a linear differential equation 

J 

— ,x( t ) = A(t) x(t) + u(t ) (5.40) 

dt 

in. place of the specific equations of spacecraft motion. u(t) is a vector 
stochastic process serving as a model for unknown (unmodeled) forcing 
terms, and generated by another first order differential equation 

— u(t) = B(p) u(t) + n(t) (5.41) 

dt 

where the coefficient matrix B(p) depends on a vector p of unknown para- 
meters. The unspecified nature of the dependence of B on p is the second 

aspect of our generalization. Ingram and Tapley consider a diagonal matrix 

✓ 

B, and p has elements the negative inverses of the diagonal elements of B. 
n(t) is a vector white noise process with E{n(t) n T (s)} = G(t) 6(t-s), serving 
as a formal representation in place of the mathematically precise Ito 
stochastic differential equation 

du(t) = B(p) u(t) + dW(t) (5.42) 

The lack of a coefficient matrix in fron of n(t) poses no restriction since the 
corresponding effect can be trivially incorporated in the covariance matrix 
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G(t) by means of a simple propagation. The constant vector p can be also 
trivially modeled by the differential equation 


* — p = 6 • (5.43) 

dt 


Summarizing, and after some obvious change' to a more compact notation, 
our state-model becomes ‘ • 


• x t =‘*A t x t +-u t (5.44a) 

u t = B(p) u t + n t (5.44b) 

p t • = 0 (5.44c) 


Integrating with the' help of initial values p (t,) = p t , u(t,) = u 1 , 
x(t t ) = x s , we obtain 


Pt = Pi 


(5.45a) 


Ui 


t 

■ “■ + / n f d « - 


®u(Ml)( p j VL V + IJt.tj.p,) 


(5.45b) 


Xt " 


^ x (t»t t ) Ij + j u^ 

t. 


= 


<Mt,t,) x t + ^(Mj) Uj + / $*(*,£) 1,(£) d£ 


t 

/ 


(5.45c) 


l-m 



where 




b 

/ 


$*(t,£) t,) d£ 


(5.46) 


Introducing 


Q^tjt^x^u^pi) = <M*>ti) x t + $ xu (t»ti.Pi) Uj (5.47a) 


0 u (t,t 1 ,u 1 ,p 1 ) = $ u (Mi) (Pi) u i 


and 


6p(Pi) 


V 


= Pi 




t?xl(t»tl,Pl) = 


t 

/ 




(5.47b) 

(5.47c) 

(5.48a) 

(5.48b) 

(5.48c) 


we arrive at 


X t = 0(t, t t ,X t ) + Tj j (t> t t »X t ) 


where 


(5.49) 


X? 


= [ *1 ui P T t ], 0t = [ 0 T u 0p ]. n\ = [ t?ui 0 ] 


For t = t 1+1 we have 

X 1+1 = + Yi i (tj+i ,tj ,Xj ) (5.50) 

This is a discrete state model and can bo used together with a set of 
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observations (assuming linear dependence on the state for simplicity) 

7i = X t + v t -, E{ Vi } = 0, E^v^R^i, (5.51) 

to solve the prediction -smoothing problem. However, there are some . 
difficulties associated with this model. One of them is the nonlinearity of 
0 with respect to p t , which we can overcome by linearization. The second 
is the dependence of the state noise r} l on the state X t (actually only on p t ), 
and the final one is the non- whiteness of the sequence £ 77 1 }. If the values 
[Pt } are considered to be constants, it can be easily seen that 

E { 7? t } = 0 and E{rj 1 'f7j}=0 for i?^j (5.52) 

However, p t is part of the state X t and as such it is a random vector. In 
view of Pi+i = Pit we have 

E{pi}=E{p 0 } and E{p t p ) } = E{p 0 po} ^ 0 (5.53) 

Because of the complex. dependence of 77 1 on p t it is difficult to evaluate 
E { 7? 1 77 }}, but the two vectors are in general correlated since p t and p i 
are. To overcome these difficulties the sequence £ 77 4 3 is approximated by 
a new sequence which does not depend on the state (p 1 ) and is also 

white (E £ 77 j. 7? j } = 0 for i^ j). 

The approximation, which is essential to the resulting algorithm, is 

A A 

to replace l t (t 1+ j ,t t , p t ) by a random vector lj such that E { lj } - 0 and 
E{li lj T } = E{l t lj } 6 4j . This we can achieve with the help of a white 
sequence n l9 such that EfnJ = 0 and E{n 1 nj}= I 6 jj, by setting 

l t = K® n 4 (5.54) 

where K { - E f 1, 1' } T }, and the expectation is taken by treating p j as a 
nonrandom constant, 
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^i(^tn»*nPi) (5.55) 


K t - / G© # T u (t,„,a a? = 


Kj depends on p t since <1> U depends on p,. It follows easily that 

E{i,l/ } = K* E{n t n/ } (K * ) T = K? (K? ) T - 6,j = K t fi tJ (5.56) 


By setting 


r- ' t 


^ ui ^li xi 


/ 


i+i 

$ x (t 1+ i,S) d£ 


It = s t 1, 


and fj\ = [ 77xt 77^ 0 ] , we have 


where 


= 0 and E{tj t rjj] = Q t 6 M 


(5.57) 


Q t = 


S, K, S t T 
Kj Si 
0 


s, K t 


K. 


0 

0 

0 


= Q t (p,) (5.58) 


We have now arrived at the state model 


X 1+1 = e^.ti.Xi) + 77 1 (5.59) 

such that the smoothing -prediction problem can be solved with the help of 
standard techniques (extended Kalman -Bucy filter, see [ Jazwinski, 1978, 
p. 278; and Tapley, 1973, p. 411]), except for the dependence of the state 
noise covariance matrix Q t ( p t ) on part of the state. Ingram and Tapley 
simply write Q l (P) without specifying which value of p is used (T x , T y , 
T z in their notation). It is natural to assume that p { is replaced 
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by its current estimate p t based upon all past observations y r (r- 1,2, . . , i). 
The central point in the derivation of the algorithm is the approxima- 

A' "* 

tion of 1, by l t . Ingram [1970] offers no explanation or motivation about 
this approximation. Further reference is given to [Jordan, 1966] (to which 
we had no access with theremark [Ingram, 1970, p. 22] that the original 
white noise process n(t) is considered constant over the interval [t^ ,t 1+ t ]. 
This, at least mathematically, is absurd. The very essence of the white 
noise process is the everywhere discontinuity of its sample functions, and 
n(£) - n(£)> £> C£[ti,tj + 1 ], is unbounded no matter how small 1 1 1+1 ~til is. 

A 

It must be pointed out that the same white sequence Vi may be 
obtained by a much simpler argument. The random vector P(t) is approx- 
imated over the interval [t 4 ,t 1+1 ] by a constant nonrandom vector equal to 
its current estimate p t (based upon past observations up to the epoch t, . 

A A A A 

In this case we replace li(ti + i,ti,p t ) by li(t i +i»t 1 ,p 1 ), so that E[lili} = 

= K t ( p t ), and proceed to obtain the white sequence Vi with E {rh Vi 5 = Qi(Pi)* 
Assuming for a moment that despite the approximations involved, the 
final solution (after smoothing where state noise covariance matrices are 
also updated) is sufficiently close to the "true" solution of the original model 


x t = A t. it + u t (5.44a) 

u t = B(p) u t + n t (5.44b) 

pt = 0 (5.44c) 

we may identify the problem with the one described by 

x t = A t x t + u t ' (5,60a) 

i : - . ■>-' . u t - B(p ,) u t + n t (5.60b) 


where p is the final optimal estimate of p based upon all the observations. 
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In this case we can directly model u t as a stochastic process with known 
mean and covariance function, spec, fed by the solution of eq. (5.60b) and 
the initial value u 0 = u(t 0 ). 

For the specific choice of 



-Px 

0 

0 


3 

0 

0- 

B(p) = 

0 

-Py 

0 

E[n t -n, } = 

0 

2 

cr y 

O' 


0 

0 

-p« 


0 

0 

a 

07 


6(t-s) 


each component of ut is an independent stochastic process with mean 


E{u x (t)} = e 


-(t-t 0 ) Pi 


E{u x (t 0 )3 


(5.61) 


anu covariance 


-(t+s)p£ 2t 0 Px . 

6 v ” 


C(t,s) = ^Efu x (t 0 )} - ^ PxJ 

e -| t "s|px + 

'° P * ^E{u x (t 0 )}) 


^ I Oi Ps e 


— (t+s) p x 
e e 


(5.62) 


with similar expressions for u y (t) and u z (t ) . 

f 

Ingram [1970, p. 22) claims stationarity of u x (t), recognizing the 
necessity of p x being a constant, but not the special relation between p x 
and the initial value u x (t 0 ) statistics also necessary for stationarity 
(see Section 5.3). The initial values are modeled as random variables with 
E { u(1q ) } = u G , E[u(t 0 ) u T (t 0 )] - u 0 Uq “ ao I [Ingram, 1970, pp.76-77]. 
If u 0 ^ 0, then E [u t ^ const., and u t is certainly nonstationary. If 
u Q = 0, then, to secure stationarity one must set 

p = 2 where CTq x = E[u x (t 0 )} (5.63) 
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in which case p x should be a fixed constant and not a parameter under 
estimation. Ingram and Tapley [ 1974, p. 195] mention that the initial 
conditions are unknown. If this implies that u 0 is a deterministic constant 
to be estimated simultaneously, we again arrive to a nonstationary solution 
in general (see Section 5.3). 

In view of the reported success of the DMC algorithm, its adaptivity 
must be looked for in the approximations themselves rather, than in the 
original rigorous state equations. If p has been fixed, the state noise 
matrix Q(t k+ x ,t k , p) should have been a priori defined. On the other hand, 
the state noise matrix Q(t k+1 ,l k ,p k ) is allowed to vary in the filtering 
algorithm, according to the current estimate p k of the parameter p . 
Although p k = Py-i in the discrete state model, the same equality does 
not hold for the corresponding estimates 

Pk ^ Pk— 1 (5.64) 

A - * 

where p j is the estimate of p based upon past observations y 3 up to the 
present epoch (j = l, 2,..., i), and the one additional observation y k pro- 
duces in general a change on the new estimate p k of p. 

The concept of an exponentially correlated forcing term can still be 
maintained within each interval [t t ,t 1+1 ] where p l is treated as a constant 
and corresponds to a "local covariance function" formulation designed to fit 
the local behavior of the unknown disturbing function. On the whole time 
interval [ t r , t n } , however, the variation of the estimates p t of p used, 
gives rise to a nonstationary stoehastic'process u(t). 

5.6 Adaptive Estimation in General 

The general adaptive estimation problem may be defined as follows: 
The evolution in time of the state x t of a dynamical system is described by 
a differential equation of the form 

x t ~ f (x t , t, € t ) . (5.65) 
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where c t represents a function of unknown disturbances. It is possible to 
model , not as a single stochastic process u t , but as a whole class of 
stochastic processes generated with the help of some other known process 
n t and a set of unspecified parameters p, 

' u t =• g(u t ,t,n t ,p) . (5.66) 

i 

Every set of parameters p specifies a different stochastic process u t , and 
an optimal set p* is sought, such that the resulting stochastic process ut 
is- an optimal model (in a sense that remains to be specified) of the unknown 
disturbances e t , for the purpose of estimating the state x t (and possibly 
another set of parameters a ) from a finite set of observations 

y* = h 4 (x tj ,t t ,a) + v t (5.67) 

where v t is a sequence of zero mean random vectors with known covari- 
ances . • 

It is essential to realize that the differential equation (5.66) does not 
model any physically realizable dynamical system, but merely serves as an 
artificial means for constructing a class of stochastic processes u t (p). 

The nature of this class must be a priori specified (by means of selecting a 
certai process n t and a certain function g(u*,t,n t ,p) ), and. its appropria- 
teness must be justified by considerations related to the nature of the 
unknown disturbances e t to be modeled. 

Since p is a vector of constants, we have p = 0, and through a 
technique called "vector augmentation" we may rewrite our model as 



x t 

* 

f(x t ,t,u t ) • 



Xt = 

• 

u t 

= 

g(u t ,t,n t ,p) 

= F(X t ,t,n t ) 

(5.68) 


Pt 


0 




The solution to the above stochastic differential equation is by no means 



trivial because of the nonlinear dependence of F(X t ,t,n t ) on X t and n t . 
The simplest linear model correspond to 


f(x t .,t,n t ) = At x t + u t (5.69a) 

g(u t ,t,n t ,p) = B t u t + D t p + n t . (5.69b) 

In this case the solution u t is of the form 


t 

r 

ft 

r 

U t ='<Mt,to) U 0 + / 3\(t,S) n^ d£ + 

J $„<*,€) 

*0 

- 4 o ■ - 


= u lt + D t p 


U lt + U 8t 


(5.70) 


u t is now the sum of a known stochastic process u j t and a parametrized 
deterministic function u 2t (p) which does not contribute to the covariance of 
u t . It is possible to include the known mean of u lt in u at , so that u t is 
modeled as a stochastic process with known covariance and parametrized 
mean. This is a well known technique (see for example [ Parzen, 1961] ) , 
but deviates from our main objective which is the adaptivity (parametriza- 
tion) of the covariance function also. 

► i 

The next simpler model is similar to the one in the DMC algorithm, 
obtained by setting 


g(u t ,t,n t ,p) = B(p) u t . + n t (5.71) 

» * , , t 

the only nonlinearity being with respect to p. 

Returning to the general nonlinear stochastic differential equation 
(5. 68), let us assume that a solution has been somehow obtained. Such a 
solution is a stochastic process X t with mean and covariance depending on 
those of n t and the initial values X Q (Xo,u 0 ,p 0 ) modeled as random vari- 
ables.. Assuming for simplicity linear observations and no parameters. 
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we have 


Vi = H, Xj + v t , i = 1» 2 , , n 


( 5 . 72 ) 


Let X denote the vector of all states x t , 

X T = [ xj ... x n T ] 

U that of Uj, and P = p 0 , since all states p t are the same. From the 
known mean and covariance of the solution process X t it is possible to 
obtain the mean and covariance of the vector 


Z T = [ X T U T P T ] , 



X 


X - 

6X 


Sxx 

S xu 

S xp 

E{Z} = 

U 


u - 

6U 

, E{ZZ t 1 - ZZ T = 

Sux 

Suu 

Su ? 


P 


p - 

6P 


Spx 

S PU 

Spp 


The observation equations can be written in the following matrix form: 


Y = H X + V or L 


Y - H X 


[ H I ] 


6X 

V 


Solving this least squares adjustment (condition equations) problem we 

A A — A 

obtain an estimate gX and X = X + gX . The effect of the observations 
on the nonobservable parts of the state U and P can be evaluated through 
least squares prediction, 

A -1 A * A 

6P = S px S xx gX , P = P + gP 

and similarly for U. X, the final estimate of the state at observation epochs 
does not depend at all on the final estimate p of p, but only on the mean and 
covariance function of the solution process X t and consequently on the mean 
and covariance of the initial value p D . In fact there is no adaptivity of the 
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mean and covariance function of the state noise process u t in the vector 
augmentation technique. The solution differs from one with fixed parame- 
ters p, only in the fact that the uncertainty in an original estimate p Q of p 
has been taken into account. 

Adaptivity can be introduced by resolving the problem, fixing p to the 
value of the obtained estimate p , evaluating mean and covariance function 
of u t , and using only the first part of the state equations 

x t = f(x t ,t,u t ) 

The optimality of the estimate p is hard to define. This estimate 

A 

depends on the observations (through X being a function of, Y),, and to this 
extend is adapted to the observational data, but it also largely depends on 
the mean and covariance of the initial estimate p 0 . Because of this, the 
vector augmentation technique fails to provide a solution to the following 
more idealized problem: 

Given the mean and covariance of the state noise u t within a set of unknown 
parameters p, find among all permissible values of p the one which is 
optimal in a sence to be defined. 

The solution x t of the state differential equation has mean and cova- 
riance function depending on the unspecified parameters p. The vector X 

, S * 

corresponding to, states at observational epochs has mean X (p) and cova- 
riance matrix S (p ), both depending on p. For linear observations the 
model for the adjustment is 

L (p) = Y - H X(p) = H 6X + V, E{6X 6X T } = S(p) 

A -« A 

and the final estimate X (p) = X(p) + 6X(p) depends also on p. 

* ' a' 

A simple way to define optimality of X ( p ) is the minimization of the vari- 
ance of any scalar linear function of X 

f q = b T X 
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i. e. , to find the optimal vector p which minimizes 


<t?(p) = b T [ E[X(p) X T (p)} - X(p) X T (p) ] b = 

= b T E{6X(p) 6X T (p)} b . 

Such a criterion however, can be justified only when the variance to be 
minimized is the "true" variance based upon "true" a priori statistics of 
both the observational errors and the signal 6X . The result is the minimi- 
zation of the nominal rather than the true variances of estimates and the 
appropriateness of such an optimality criterion is questionable. 

A more reasonable criterion of optimality is to be found through 
intuitive reasoning, rather than in any rigorous definition. In essence, the 
role of estimation is to separate the useful signal related to the states of the 
dynamical system from the unwanted observational noise. If such a separa- 
tion has not taken place because of the use of an incorrect model, the effect 
is to be seen on the estimates of the observational residuals v t . If the . 
residuals are too small, too much signal has been taken out. If they are too 
large, or they just show some systematic pattern, then part of the signal 
has not been detected. In general, the inconsistency of residuals with their 
a priori statistics strongly indicates the use of an inappropriate model. 

This effect manifests itself during computations associated with filtering in 
what is commonly called "filter divergence" [ Jazwinski, 1970, p. 302]. 

In a global (nonsequential) solution, the parameters p determining 
the statistics of the unknown disturbing function u t must be varied by "trial 
and error" until consistency of residuals with their a priori statistics is 
reached. This presupposes that first, the parameters p give rise to a class 
of statistics for the process u t which is wide enough, so as to contain an 
element close to the "true" statistics. The second assumption is that the 
a priori statistics of the observational errors are accurately known through 
calibration and standard statistical techniques. 
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Related to global adaptivity are techniques, such as the DMC algo- 
rithm, of local or real time adaptivity. These techniques aim at modifying 
the state noise covariance matrix during each step of the filtering process, 
in order to overcome the filter divergence; problem. Such techniques are 
discussed in [Jazwinski, 1970, Chapter 8, especially Section 11; and Gelb, 
1974, Sections 8.1 and 9.1]. 

Jazwinski [1969] considers a discrete linear dynamical system model 
x k +: =N$k+i x* + G k w k , ,, . . . (5. 73) 

with observations 

y k = H k x k + v k (5.74) 

where the state noise w k is a zero mean white Gaussian sequence, . 
accounting for errors made in modeling the dynamics of the system. The 
state noise covariance matrices Q k are considered to be the same over 
every N observations 

Q k +i = Q k , N > i = l,2,...,N 

and an algorithm is devised for determining Q k>N so that the produced 
residuals are consistent with their a priori statistics. 

Another approach very much similar to the DMC algorithm, is • 
given in [ Jazwinski, 1974]. The adopted system model is 

x t = f(x t ,t) +. G Ut • , . (5.75) 

and u t is modeled over each interval [t t ,t 1+1 ] between observations as 

u t = u { + (t-ty) (5.76) 

- t , 

where is a sequence of random variables with fixed a priori uncertainty. 
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The discrete state equations of the form 


x i+1 = F(xj,u t ) t,stst w (5.77) 

" are augmented by 

u 1+1 = u t + fa (t 1+l -t t ) (5.78a) 

0i+i = & <j3 t = 0 for t t £t£t 1+1 ) (5.78b) 

and the problem is solved with the help of the extended Kalman filter 
algorithm. 
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-6. SUMMARY AND' RECOMMENDATIONS , 

The objective of this work has been twofold: First, to clarify the 
mathematical and probabilistic background of standard linear estimation 
techniques used in geodesy and to reveal their interrelationship. Secondly, 
to address what we. considered to be the two most important estimation 
problems in geodesy: the norm choice problem in gravimetric collocation, 
and the adaptive determination of the stochastic models for not directly 
observable physical processes. 

Linear least squares adjustment and linear least squares prediction 
have been shown to reduce to linear best approximation problems. In the • 
former, the observations are best approximated from elements of the 
"model space, " while in the latter, the unknown parameters are best 
approximated from elements of the "data space." Probabilistic concepts 
in least squares adjustment have been shown to refer only to the definition 
of the .metric for the approximation, while in least squares prediction the 
structure of the approximation space itself (space of second-order random 
variables) in addition relies on probabilistic ideas. However, for the case 
of Hilbert space valued random variables, least squares prediction has been 
related to deterministic (exact) collocation. Estimate optimality criteria 
(minimum error bounds) have been identified for the single parameter predic- 
tion as opposed to the global minimum norm solution for the unknown function. 

Kalman-Bucy filtering techniques have been shown to reduce in the 
case of discrete observations to least squares adjustment (for filtering- 
smoothing) and least squares prediction (for prediction). . 

The use of stochastic models has. been investigated for the two most 
important physical processes related to geodetic work: the gravity field 
and the rotation of the earth . Motivated by Lauritzen's proof of nonergodicity 
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for the gravity field of the earth, a deterministic criterion for optimality of 
related- predictions has been introduced leading to a proposed solution of the 
optimal norm choice problem in (exact) collocation. In the case of the rotation 
of the, earth, the relation of stochastic models for polar motion and diurnal 
rotation to those for the excitation function has been shown,, through the use 
of the linearized Liouville equation for both cases:with and without damping. 

The possibility of constructing simple stochastic models from white 
noise has been explored, and the conditions for stationarity of the output of 
first-order autoregressive models, excited by white noise, have been 
established. , • 

The Dynamic Model Compensation algorithm has been generalized, 

and its adaptive structure and inherent approximations have been clarified. 

Other possibilities for adaptively estimating both state and unknown effects 

v 

on dynamical systems and the nature of the optimality of such estimates 
have been explored. ■ • - • • . • ' 

Of course a great deal has yet to be done both .with respect to. 
gravimetric problems and to the adaptive estimation in geometric- 
dynamic geodesy . 

* The computational feasibility of obtaining an optimal model covariance 
or norm, using minimum error -bound criteria, remains to be demonstrated. 

A way must be found to include the effect of observational errors. 

Adaptive estimation techniques, because of their vary nature 
(adaptivity to observational evidence), can only be compared and justified in 
connection with real problems. The success of the Dynamic Model Compensa- 
tion algorithm in estimating urimodeled accelerations on satellites encourages 
the use of this and similar adaptive algorithms, especially for VLBI observa- 
tions when estimating .the unknown excitation function giving rise to earth 
rotation. 
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